Introduction
This markdown document is designed to briefly show the revised results pertaining to the MitoImpute. We noticed that HaploGrep2 is able to capture more haplogroups than the method of haplogroup assignment we were previously using, HiMC. Therefore, we have generated results to display the old HiMC outputs, as well as the new HaploGrep outputs. Additionally, I have included string distances between the ‘truth’ haplogroupings, assigned from the multiple sequence alignment, and quality scores
Minor allele frequency experiments
This section will detail the minor allele frequency experiments.
## Rows: 387
## Columns: 71
## $ array <fct> BDCHP-1X10-HUMANHAP24…
## $ mcmc <chr> "MCMC1", "MCMC1", "MC…
## $ refpan_maf <ord> MAF1%, MAF1%, MAF1%, …
## $ k_hap <ord> kHAP500, kHAP500, kHA…
## $ imputed <lgl> TRUE, FALSE, FALSE, F…
## $ info_cutoff <dbl> 0.3, NA, NA, NA, NA, …
## $ n_snps_array <dbl> 309, NA, NA, NA, NA, …
## $ n_snps_imputed <dbl> 483, NA, NA, NA, NA, …
## $ n_snps_cutoff_imputed <dbl> 467, NA, NA, NA, NA, …
## $ n_type_0 <dbl> 181, NA, NA, NA, NA, …
## $ n_type_1 <dbl> 0, NA, NA, NA, NA, 0,…
## $ n_type_2 <dbl> 229, NA, NA, NA, NA, …
## $ n_type_3 <dbl> 73, NA, NA, NA, NA, 4…
## $ n_type_0_cutoff <dbl> 165, NA, NA, NA, NA, …
## $ n_type_1_cutoff <dbl> 0, NA, NA, NA, NA, 0,…
## $ n_type_2_cutoff <dbl> 229, NA, NA, NA, NA, …
## $ n_type_3_cutoff <dbl> 73, NA, NA, NA, NA, 4…
## $ mean_info <dbl> 0.8791739, NA, NA, NA…
## $ mean_info_cutoff <dbl> 0.9037966, NA, NA, NA…
## $ mean_maf <dbl> 0.06190269, NA, NA, N…
## $ mean_maf_cutoff <dbl> 0.06381799, NA, NA, N…
## $ mean_mcc <dbl> 0.8179815, NA, NA, NA…
## $ mean_mcc_cutoff <dbl> 0.8727745, NA, NA, NA…
## $ mean_concordance <dbl> 0.9958531, NA, NA, NA…
## $ mean_concordance_cutoff <dbl> 0.9959055, NA, NA, NA…
## $ mean_certainty <dbl> 0.9973703, NA, NA, NA…
## $ mean_certainty_cutoff <dbl> 0.9974721, NA, NA, NA…
## $ mean_himc_concordance_typed <dbl> 0.9806553, NA, NA, NA…
## $ mean_himc_concordance_typed_macro <dbl> 0.9936834, NA, NA, NA…
## $ mean_himc_concordance_imputed <dbl> 0.9885511, NA, NA, NA…
## $ mean_himc_concordance_imputed_cutoff <dbl> 0.9885511, NA, NA, NA…
## $ mean_himc_concordance_imputed_macro <dbl> 0.9984208, NA, NA, NA…
## $ mean_himc_concordance_imputed_macro_cutoff <dbl> 0.9984208, NA, NA, NA…
## $ mean_haplogrep_concordance_typed <dbl> 0.3062352, NA, NA, NA…
## $ mean_haplogrep_concordance_typed_macro <dbl> 0.9932912, NA, NA, NA…
## $ mean_haplogrep_concordance_imputed <dbl> 0.2841358, NA, NA, NA…
## $ mean_haplogrep_concordance_imputed_cutoff <dbl> 0.2892660, NA, NA, NA…
## $ mean_haplogrep_concordance_imputed_macro <dbl> 0.9940805, NA, NA, NA…
## $ mean_haplogrep_concordance_imputed_macro_cutoff <dbl> 0.9940805, NA, NA, NA…
## $ mean_haplogrep_quality_truth <dbl> 0.8560609, NA, NA, NA…
## $ mean_haplogrep_quality_typed <dbl> 0.9822484, NA, NA, NA…
## $ mean_haplogrep_quality_imputed <dbl> 0.9785349, NA, NA, NA…
## $ mean_haplogrep_quality_imputed_cutoff <dbl> 0.9789348, NA, NA, NA…
## $ mean_haplogrep_distance_dl_typed <dbl> 1.865430, NA, NA, NA,…
## $ mean_haplogrep_distance_dl_imputed <dbl> 2.160616, NA, NA, NA,…
## $ mean_haplogrep_distance_dl_imputed_cutoff <dbl> 2.123125, NA, NA, NA,…
## $ mean_haplogrep_distance_lv_typed <dbl> 1.865430, NA, NA, NA,…
## $ mean_haplogrep_distance_lv_imputed <dbl> 2.160616, NA, NA, NA,…
## $ mean_haplogrep_distance_lv_imputed_cutoff <dbl> 2.123125, NA, NA, NA,…
## $ mean_haplogrep_distance_jc_typed <dbl> 0.2800019, NA, NA, NA…
## $ mean_haplogrep_distance_jc_imputed <dbl> 0.3019733, NA, NA, NA…
## $ mean_haplogrep_distance_jc_imputed_cutoff <dbl> 0.2982575, NA, NA, NA…
## $ himc_diff <dbl> 0.007895776, NA, NA, …
## $ himc_cutoff_diff <dbl> 0.007895776, NA, NA, …
## $ himc_macro_diff <dbl> 0.004737465, NA, NA, …
## $ himc_macro_cutoff_diff <dbl> 0.004737465, NA, NA, …
## $ haplogrep_diff <dbl> -0.022099448, NA, NA,…
## $ haplogrep_cutoff_diff <dbl> -0.016969219, NA, NA,…
## $ haplogrep_macro_diff <dbl> 0.000789266, NA, NA, …
## $ haplogrep_macro_cutoff_diff <dbl> 0.000789266, NA, NA, …
## $ haplogrep_quality_diff <dbl> -0.003713536, NA, NA,…
## $ haplogrep_quality_cutoff_diff <dbl> -0.003313575, NA, NA,…
## $ haplogrep_quality_diff_truth_typed <dbl> -0.1261875, NA, NA, N…
## $ haplogrep_quality_diff_truth_imputed <dbl> -0.1224739, NA, NA, N…
## $ haplogrep_quality_diff_truth_imputed_cutoff <dbl> -0.1228739, NA, NA, N…
## $ haplogrep_distance_dl_diff <dbl> 0.2951855, NA, NA, NA…
## $ haplogrep_distance_dl_cutoff_diff <dbl> 0.2576953, NA, NA, NA…
## $ haplogrep_distance_lv_diff <dbl> 0.2951855, NA, NA, NA…
## $ haplogrep_distance_lv_cutoff_diff <dbl> 0.2576953, NA, NA, NA…
## $ haplogrep_distance_jc_diff <dbl> 0.0219713276, NA, NA,…
## $ haplogrep_distance_jc_cutoff_diff <dbl> 0.018255556, NA, NA, …
HiMC
HiMC Haplogrouping
We previously found that imputing missing variants increased the accuracy of haplogroup assignments when using HiMC to assign haplogroups.
Compare this result with the imputed data, which shows a higher haplogroup concordance:
If the improvement in accurate assignment of haplogroups wasn’t evident from the last two plots, displaying the mean difference should make this clear:
Table showing the residuals for the linear model testing for significant difference in the means of imputed haplogroup concordance
| refpan_maf |
2 |
0.0029833 |
0.0014916 |
0.0441965 |
0.9567721 |
| Residuals |
304 |
10.2600244 |
0.0337501 |
NA |
NA |
Table showing the estimated marginal means for the linear model testing for significant difference in the means of imputed haplogroup concordance for different Reference Panel minor allele frequency filtering thresholds
| MAF1% |
0.8600744 |
0.0182800 |
304 |
0.8241030 |
0.8960458 |
| MAF0.5% |
0.8551735 |
0.0181017 |
304 |
0.8195530 |
0.8907939 |
| MAF0.1% |
0.8525313 |
0.0181017 |
304 |
0.8169108 |
0.8881517 |
Table showing the contrasts for the linear model testing for significant difference in the means of imputed haplogroup concordance for different Reference Panel minor allele frequency filtering thresholds
| MAF1% - MAF0.5% |
0.0049010 |
0.0257261 |
304 |
0.1905065 |
0.9801924 |
| MAF1% - MAF0.1% |
0.0075432 |
0.0257261 |
304 |
0.2932113 |
0.9537218 |
| MAF0.5% - MAF0.1% |
0.0026422 |
0.0255996 |
304 |
0.1032120 |
0.9941443 |
Table showing the residuals for the linear model testing for significant difference in the mean concordance of assigned haplogroups between genotyped and imputed data
| refpan_maf |
2 |
0.0054205 |
0.0027102 |
0.1366648 |
0.8723168 |
| Residuals |
300 |
5.9493818 |
0.0198313 |
NA |
NA |
Table showing the estimated marginal means for the linear model testing for significant difference in the the mean concordance of assigned haplogroups between genotyped and imputed data for different Reference Panel minor allele frequency filtering thresholds
| MAF1% |
0.3103129 |
0.0140125 |
300 |
0.2827377 |
0.3378881 |
| MAF0.5% |
0.3203330 |
0.0140125 |
300 |
0.2927579 |
0.3479082 |
| MAF0.1% |
0.3176033 |
0.0140125 |
300 |
0.2900282 |
0.3451785 |
Table showing the contrasts for the linear model testing for significant difference in the the mean concordance of assigned haplogroups between genotyped and imputed data for different Reference Panel minor allele frequency filtering thresholds
| MAF1% - MAF0.5% |
-0.0100201 |
0.0198166 |
300 |
-0.5056419 |
0.8686475 |
| MAF1% - MAF0.1% |
-0.0072904 |
0.0198166 |
300 |
-0.3678945 |
0.9281329 |
| MAF0.5% - MAF0.1% |
0.0027297 |
0.0198166 |
300 |
0.1377474 |
0.9895942 |
HiMC Macrohaplogrouping
The trend of which can be further seen when only macro-haplogroups are considered:
Compare this result with the imputed data, which shows a higher haplogroup concordance:
If the improvement in accurate assignment of haplogroups wasn’t evident from the last two plots, displaying the mean difference should make this clear:
These can be statistically tested with linear models:
Table showing the residuals for the linear model testing for significant difference in the means of imputed macrohaplogroup concordance
| refpan_maf |
2 |
0.0059374 |
0.0029687 |
0.0926436 |
0.911544 |
| Residuals |
304 |
9.7415101 |
0.0320444 |
NA |
NA |
Table showing the estimated marginal means for the linear model testing for significant difference in the means of imputed macrohaplogroup concordance for different Reference Panel minor allele frequency filtering thresholds
| MAF1% |
0.8938627 |
0.0178121 |
304 |
0.8588120 |
0.9289133 |
| MAF0.5% |
0.8883512 |
0.0176383 |
304 |
0.8536425 |
0.9230599 |
| MAF0.1% |
0.8830726 |
0.0176383 |
304 |
0.8483639 |
0.9177813 |
Table showing the contrasts for the linear model testing for significant difference in the means of imputed macrohaplogroup concordance for different Reference Panel minor allele frequency filtering thresholds
| MAF1% - MAF0.5% |
0.0055115 |
0.0250676 |
304 |
0.2198638 |
0.9737057 |
| MAF1% - MAF0.1% |
0.0107901 |
0.0250676 |
304 |
0.4304400 |
0.9029618 |
| MAF0.5% - MAF0.1% |
0.0052786 |
0.0249444 |
304 |
0.2116161 |
0.9756173 |
Table showing the residuals for the linear model testing for significant difference in the mean concordance of assigned macroaplogroups between genotyped and imputed data
| refpan_maf |
2 |
0.0043381 |
0.0021691 |
0.0892709 |
0.9146221 |
| Residuals |
300 |
7.2892487 |
0.0242975 |
NA |
NA |
Table showing the estimated marginal means for the linear model testing for significant difference in the the mean concordance of assigned macrohaplogroups between genotyped and imputed data for different Reference Panel minor allele frequency filtering thresholds
| MAF1% |
0.2363714 |
0.0155103 |
300 |
0.2058486 |
0.2668942 |
| MAF0.5% |
0.2455937 |
0.0155103 |
300 |
0.2150710 |
0.2761165 |
| MAF0.1% |
0.2401832 |
0.0155103 |
300 |
0.2096605 |
0.2707060 |
Table showing the contrasts for the linear model testing for significant difference in the the mean concordance of assigned macrohaplogroups between genotyped and imputed data for different Reference Panel minor allele frequency filtering thresholds
| MAF1% - MAF0.5% |
-0.0092223 |
0.0219349 |
300 |
-0.4204415 |
0.9072014 |
| MAF1% - MAF0.1% |
-0.0038118 |
0.0219349 |
300 |
-0.1737785 |
0.9834902 |
| MAF0.5% - MAF0.1% |
0.0054105 |
0.0219349 |
300 |
0.2466630 |
0.9670199 |
These results suggest that there is no statistically significant difference in accurate assignment of haplogroups or macrohaplogroups between different Reference Panel minor allele frequency filtering thresholds.
HaploGrep 2.0
HaploGrep Haplogrouping
We are investigating using HaploGrep 2.0 for assigning haplogroups, as HaploGrep has a greater ability to assign haplogroups that cover all sub-groupings.
Compare this result with the imputed data, which shows a higher haplogroup concordance:
If the improvement in accurate assignment of haplogroups wasn’t evident from the last two plots, displaying the mean difference should make this clear:
Table showing the residuals for the linear model testing for significant difference in the means of imputed haplogroup concordance
| refpan_maf |
2 |
0.0976431 |
0.0488216 |
4.789485 |
0.0089547 |
| Residuals |
304 |
3.0988201 |
0.0101935 |
NA |
NA |
Table showing the estimated marginal means for the linear model testing for significant difference in the means of imputed haplogroup concordance for different Reference Panel minor allele frequency filtering thresholds
| MAF1% |
0.1672931 |
0.0100462 |
304 |
0.1475243 |
0.1870620 |
| MAF0.5% |
0.1872476 |
0.0099482 |
304 |
0.1676716 |
0.2068236 |
| MAF0.1% |
0.2109831 |
0.0099482 |
304 |
0.1914071 |
0.2305590 |
Table showing the contrasts for the linear model testing for significant difference in the means of imputed haplogroup concordance for different Reference Panel minor allele frequency filtering thresholds
| MAF1% - MAF0.5% |
-0.0199545 |
0.0141383 |
304 |
-1.411377 |
0.3362773 |
| MAF1% - MAF0.1% |
-0.0436899 |
0.0141383 |
304 |
-3.090183 |
0.0061722 |
| MAF0.5% - MAF0.1% |
-0.0237355 |
0.0140688 |
304 |
-1.687096 |
0.2117763 |
Table showing the residuals for the linear model testing for significant difference in the mean concordance of assigned haplogroups between genotyped and imputed data
| refpan_maf |
2 |
0.1160262 |
0.0580131 |
163.6211 |
0 |
| Residuals |
304 |
0.1077856 |
0.0003546 |
NA |
NA |
Table showing the estimated marginal means for the linear model testing for significant difference in the the mean concordance of assigned haplogroups between genotyped and imputed data for different Reference Panel minor allele frequency filtering thresholds
| MAF1% |
-0.0395844 |
0.0018736 |
304 |
-0.0432713 |
-0.0358975 |
| MAF0.5% |
-0.0156206 |
0.0018553 |
304 |
-0.0192715 |
-0.0119696 |
| MAF0.1% |
0.0081149 |
0.0018553 |
304 |
0.0044639 |
0.0117658 |
Table showing the contrasts for the linear model testing for significant difference in the the mean concordance of assigned haplogroups between genotyped and imputed data for different Reference Panel minor allele frequency filtering thresholds
| MAF1% - MAF0.5% |
-0.0239639 |
0.0026368 |
304 |
-9.088190 |
0 |
| MAF1% - MAF0.1% |
-0.0476993 |
0.0026368 |
304 |
-18.089758 |
0 |
| MAF0.5% - MAF0.1% |
-0.0237355 |
0.0026239 |
304 |
-9.046021 |
0 |
HaploGrep Macrohaplogrouping
The trend of which can be further seen when only macro-haplogroups are considered:
Compare this result with the imputed data, which shows a higher haplogroup concordance:
If the improvement in accurate assignment of haplogroups wasn’t evident from the last two plots, displaying the mean difference should make this clear:
These can be statistically tested with linear models:
Table showing the residuals for the linear model testing for significant difference in the means of imputed macrohaplogroup concordance
| refpan_maf |
2 |
0.0012794 |
0.0006397 |
0.019601 |
0.9805911 |
| Residuals |
304 |
9.9213982 |
0.0326362 |
NA |
NA |
Table showing the estimated marginal means for the linear model testing for significant difference in the means of imputed macrohaplogroup concordance for different Reference Panel minor allele frequency filtering thresholds
| MAF1% |
0.8842553 |
0.0179758 |
304 |
0.8488825 |
0.9196281 |
| MAF0.5% |
0.8792615 |
0.0178005 |
304 |
0.8442338 |
0.9142892 |
| MAF0.1% |
0.8813994 |
0.0178005 |
304 |
0.8463717 |
0.9164271 |
Table showing the contrasts for the linear model testing for significant difference in the means of imputed macrohaplogroup concordance for different Reference Panel minor allele frequency filtering thresholds
| MAF1% - MAF0.5% |
0.0049939 |
0.0252980 |
304 |
0.1974015 |
0.9787485 |
| MAF1% - MAF0.1% |
0.0028559 |
0.0252980 |
304 |
0.1128921 |
0.9929985 |
| MAF0.5% - MAF0.1% |
-0.0021379 |
0.0251736 |
304 |
-0.0849267 |
0.9960315 |
Table showing the residuals for the linear model testing for significant difference in the mean concordance of assigned macroaplogroups between genotyped and imputed data
| refpan_maf |
2 |
0.0085920 |
0.0042960 |
2.340219 |
0.0980394 |
| Residuals |
304 |
0.5580574 |
0.0018357 |
NA |
NA |
Table showing the estimated marginal means for the linear model testing for significant difference in the the mean concordance of assigned macrohaplogroups between genotyped and imputed data for different Reference Panel minor allele frequency filtering thresholds
| MAF1% |
0.0021255 |
0.0042633 |
304 |
-0.0062637 |
0.0105148 |
| MAF0.5% |
0.0121608 |
0.0042217 |
304 |
0.0038534 |
0.0204682 |
| MAF0.1% |
0.0142987 |
0.0042217 |
304 |
0.0059914 |
0.0226061 |
Table showing the contrasts for the linear model testing for significant difference in the the mean concordance of assigned macrohaplogroups between genotyped and imputed data for different Reference Panel minor allele frequency filtering thresholds
| MAF1% - MAF0.5% |
-0.0100353 |
0.0059998 |
304 |
-1.6725957 |
0.2174178 |
| MAF1% - MAF0.1% |
-0.0121732 |
0.0059998 |
304 |
-2.0289253 |
0.1070455 |
| MAF0.5% - MAF0.1% |
-0.0021379 |
0.0059703 |
304 |
-0.3580893 |
0.9317786 |
HaploGrep Haplogrouping (with info > 0.3 cutoff)
It should be noted that, by convention, imputed variants with an IMPUTE2 info score of info <= 0.3 are excluded from the final datasets. As such, I have also displayed these results where I have excluded any imputed sites within an info score info <= 0.3.
Imputed haplogroup corcordance, :
Difference in haplogroup concordance between genotyped and imputed datasets with (cutoff info <= 0.3):
Table showing the residuals for the linear model testing for significant difference in the means of imputed haplogroup concordance
| refpan_maf |
2 |
0.0254557 |
0.0127278 |
1.252717 |
0.2872478 |
| Residuals |
294 |
2.9870905 |
0.0101602 |
NA |
NA |
Table showing the estimated marginal means for the linear model testing for significant difference in the means of imputed haplogroup concordance for different Reference Panel minor allele frequency filtering thresholds
| MAF1% |
0.1822736 |
0.0100297 |
294 |
0.1625344 |
0.2020127 |
| MAF0.5% |
0.1972935 |
0.0099319 |
294 |
0.1777469 |
0.2168401 |
| MAF0.1% |
0.2046356 |
0.0104522 |
294 |
0.1840649 |
0.2252062 |
Table showing the contrasts for the linear model testing for significant difference in the means of imputed haplogroup concordance for different Reference Panel minor allele frequency filtering thresholds
| MAF1% - MAF0.5% |
-0.0150200 |
0.0141152 |
294 |
-1.0640995 |
0.5371477 |
| MAF1% - MAF0.1% |
-0.0223620 |
0.0144860 |
294 |
-1.5436953 |
0.2720877 |
| MAF0.5% - MAF0.1% |
-0.0073421 |
0.0144184 |
294 |
-0.5092128 |
0.8669185 |
Table showing the residuals for the linear model testing for significant difference in the mean concordance of assigned haplogroups between genotyped and imputed data
| refpan_maf |
2 |
0.0739095 |
0.0369548 |
129.5151 |
0 |
| Residuals |
294 |
0.0838876 |
0.0002853 |
NA |
NA |
Table showing the estimated marginal means for the linear model testing for significant difference in the the mean concordance of assigned haplogroups between genotyped and imputed data for different Reference Panel minor allele frequency filtering thresholds
| MAF1% |
-0.0246040 |
0.0016808 |
294 |
-0.0279119 |
-0.0212961 |
| MAF0.5% |
-0.0055747 |
0.0016644 |
294 |
-0.0088503 |
-0.0022990 |
| MAF0.1% |
0.0144649 |
0.0017516 |
294 |
0.0110176 |
0.0179122 |
Table showing the contrasts for the linear model testing for significant difference in the the mean concordance of assigned haplogroups between genotyped and imputed data for different Reference Panel minor allele frequency filtering thresholds
| MAF1% - MAF0.5% |
-0.0190293 |
0.0023654 |
294 |
-8.044750 |
0 |
| MAF1% - MAF0.1% |
-0.0390689 |
0.0024276 |
294 |
-16.093751 |
0 |
| MAF0.5% - MAF0.1% |
-0.0200396 |
0.0024163 |
294 |
-8.293641 |
0 |
HaploGrep Macrohaplogrouping (with info ≥ 0.3 cutoff)
The trend of which can be further seen when only macro-haplogroups are considered:
Compare this result with the imputed data, which shows a higher haplogroup concordance:
If the improvement in accurate assignment of haplogroups wasn’t evident from the last two plots, displaying the mean difference should make this clear:
These can be statistically tested with linear models:
Table showing the residuals for the linear model testing for significant difference in the means of imputed macrohaplogroup concordance
| refpan_maf |
2 |
0.0063379 |
0.0031690 |
0.1089308 |
0.8968286 |
| Residuals |
294 |
8.5529105 |
0.0290915 |
NA |
NA |
Table showing the estimated marginal means for the linear model testing for significant difference in the means of imputed macrohaplogroup concordance for different Reference Panel minor allele frequency filtering thresholds
| MAF1% |
0.8900459 |
0.0169716 |
294 |
0.8566447 |
0.9234471 |
| MAF0.5% |
0.8839626 |
0.0168060 |
294 |
0.8508872 |
0.9170379 |
| MAF0.1% |
0.8786272 |
0.0176865 |
294 |
0.8438190 |
0.9134354 |
Table showing the contrasts for the linear model testing for significant difference in the means of imputed macrohaplogroup concordance for different Reference Panel minor allele frequency filtering thresholds
| MAF1% - MAF0.5% |
0.0060833 |
0.0238847 |
294 |
0.2546947 |
0.9648766 |
| MAF1% - MAF0.1% |
0.0114186 |
0.0245122 |
294 |
0.4658356 |
0.8873319 |
| MAF0.5% - MAF0.1% |
0.0053354 |
0.0243978 |
294 |
0.2186813 |
0.9739841 |
Table showing the residuals for the linear model testing for significant difference in the mean concordance of assigned macroaplogroups between genotyped and imputed data
| refpan_maf |
2 |
0.0098631 |
0.0049316 |
2.410029 |
0.0915852 |
| Residuals |
294 |
0.6016025 |
0.0020463 |
NA |
NA |
Table showing the estimated marginal means for the linear model testing for significant difference in the the mean concordance of assigned macrohaplogroups between genotyped and imputed data for different Reference Panel minor allele frequency filtering thresholds
| MAF1% |
0.0079161 |
0.0045011 |
294 |
-0.0009424 |
0.0167746 |
| MAF0.5% |
0.0168619 |
0.0044572 |
294 |
0.0080899 |
0.0256340 |
| MAF0.1% |
0.0219469 |
0.0046907 |
294 |
0.0127152 |
0.0311785 |
Table showing the contrasts for the linear model testing for significant difference in the the mean concordance of assigned macrohaplogroups between genotyped and imputed data for different Reference Panel minor allele frequency filtering thresholds
| MAF1% - MAF0.5% |
-0.0089458 |
0.0063346 |
294 |
-1.4122253 |
0.3358850 |
| MAF1% - MAF0.1% |
-0.0140308 |
0.0065010 |
294 |
-2.1582526 |
0.0802339 |
| MAF0.5% - MAF0.1% |
-0.0050850 |
0.0064707 |
294 |
-0.7858469 |
0.7120262 |
These results suggest that there is a statistically significant difference in accurate assignment of haplogroups between different Reference Panel minor allele frequency filtering thresholds. However, this improvement is tiny; therefore, the biological and practical significance of the improvement seems small.
These results suggest that there is no statistically significant difference in accurate assignment of macrohaplogroups between different Reference Panel minor allele frequency filtering thresholds. However, it should be noted that both the genotyped and imputed datasets allow HaploGrep to accurately call macrohaplogroups, with average accuracy in the high 80%s.
There is a slight increase in ability to accuracy call haplogroups when a filter of info > 0.3 is applied, but the biological and practical significance of the improvement again seems small.
HaploGrep haplogroup quality comparisons
We also examined the difference in HaploGrep’s quality score between the truthset, genotyped set, and imputed set.
Here I show the difference between the truth set and the genotyped set:
Here I show the difference between the truth set and the imputed set:
Here I show the difference between the truth set and the imputed set with the info score filter
info > 0.3:
Here it appears that relative to the truth set, the quality is still decreased.
However, I have also investigated the difference between the genotyped and imputed datasets to see if there is any improvement. I have only investigated the imputed dataset filtered with
info > 0.3.
On average, there is a decrease in HaploGrep quality score.
HaploGrep string distance (Damerau-Levenshtein)
We also examined the distance between the strings in assigned haplogroups, as measures of haplogroup concordance may be misleading if one sub-haplogroup isn’t correctly assigned. We used a few different measures, as different measures of distance will provide different results. All results are between the genotyped dataset and the imputed dataset with a info filter of info > 0.3
This result shows the Damerau-Levenshtein distance:
Table showing the residuals for the linear model testing for significant difference in the Damerau-Levenshtein string distance between assigned haplogroups
| refpan_maf |
2 |
4.239264 |
2.1196323 |
15.10913 |
6e-07 |
| Residuals |
294 |
41.244733 |
0.1402882 |
NA |
NA |
Table showing the estimated marginal means for the linear model testing for significant difference in the means of imputed significant difference in the Damerau-Levenshtein string distance between assigned haplogroups for different Reference Panel minor allele frequency filtering thresholds
| MAF1% |
0.3900615 |
0.0372692 |
294 |
0.3167133 |
0.4634097 |
| MAF0.5% |
0.1255738 |
0.0369056 |
294 |
0.0529412 |
0.1982063 |
| MAF0.1% |
0.1539571 |
0.0388391 |
294 |
0.0775192 |
0.2303950 |
Table showing the contrasts for the linear model testing for significant difference in the means of significant difference in the Damerau-Levenshtein string distance between assigned haplogroups for different Reference Panel minor allele frequency filtering thresholds
| MAF1% - MAF0.5% |
0.2644877 |
0.0524501 |
294 |
5.0426543 |
0.0000024 |
| MAF1% - MAF0.1% |
0.2361044 |
0.0538281 |
294 |
4.3862644 |
0.0000477 |
| MAF0.5% - MAF0.1% |
-0.0283833 |
0.0535770 |
294 |
-0.5297671 |
0.8567930 |
HaploGrep string distance (Levenshtein)
We also examined the distance between the strings in assigned haplogroups, as measures of haplogroup concordance may be misleading if one sub-haplogroup isn’t correctly assigned. We used a few different measures, as different measures of distance will provide different results. All results are between the genotyped dataset and the imputed dataset with a info filter of info > 0.3
This result shows the Levenshtein distance:
Table showing the residuals for the linear model testing for significant difference in the Levenshtein string distance between assigned haplogroups
| refpan_maf |
2 |
4.240671 |
2.1203355 |
15.11616 |
6e-07 |
| Residuals |
294 |
41.239223 |
0.1402695 |
NA |
NA |
Table showing the estimated marginal means for the linear model testing for significant difference in the means of imputed significant difference in the Levenshtein string distance between assigned haplogroups for different Reference Panel minor allele frequency filtering thresholds
| MAF1% |
0.3899951 |
0.0372667 |
294 |
0.3166518 |
0.4633384 |
| MAF0.5% |
0.1254856 |
0.0369031 |
294 |
0.0528579 |
0.1981134 |
| MAF0.1% |
0.1538171 |
0.0388365 |
294 |
0.0773843 |
0.2302498 |
Table showing the contrasts for the linear model testing for significant difference in the means of significant difference in the Levenshtein string distance between assigned haplogroups for different Reference Panel minor allele frequency filtering thresholds
| MAF1% - MAF0.5% |
0.2645094 |
0.0524466 |
294 |
5.0434049 |
0.0000024 |
| MAF1% - MAF0.1% |
0.2361780 |
0.0538245 |
294 |
4.3879250 |
0.0000474 |
| MAF0.5% - MAF0.1% |
-0.0283314 |
0.0535734 |
294 |
-0.5288336 |
0.8572589 |
HaploGrep string distance (Jaccard)
We also examined the distance between the strings in assigned haplogroups, as measures of haplogroup concordance may be misleading if one sub-haplogroup isn’t correctly assigned. We used a few different measures, as different measures of distance will provide different results. All results are between the genotyped dataset and the imputed dataset with a info filter of info > 0.3
This result shows the Levenshtein distance:
Table showing the residuals for the linear model testing for significant difference in the Jaccard string distance between assigned haplogroups
| refpan_maf |
2 |
0.1391510 |
0.0695755 |
278.8615 |
0 |
| Residuals |
294 |
0.0733525 |
0.0002495 |
NA |
NA |
Table showing the estimated marginal means for the linear model testing for significant difference in the means of imputed significant difference in the Jaccard string distance between assigned haplogroups for different Reference Panel minor allele frequency filtering thresholds
| MAF1% |
0.0261476 |
0.0015717 |
294 |
0.0230544 |
0.0292409 |
| MAF0.5% |
-0.0076583 |
0.0015564 |
294 |
-0.0107214 |
-0.0045952 |
| MAF0.1% |
-0.0265022 |
0.0016379 |
294 |
-0.0297257 |
-0.0232787 |
Table showing the contrasts for the linear model testing for significant difference in the means of significant difference in the Jaccard string distance between assigned haplogroups for different Reference Panel minor allele frequency filtering thresholds
| MAF1% - MAF0.5% |
0.0338059 |
0.0022119 |
294 |
15.283508 |
0 |
| MAF1% - MAF0.1% |
0.0526498 |
0.0022700 |
294 |
23.193401 |
0 |
| MAF0.5% - MAF0.1% |
0.0188439 |
0.0022594 |
294 |
8.340065 |
0 |
Matthew’s Correlation Coefficient (MCC)
We also determined imputation accuracy using the Matthew’s correlation coefficient (MCC). The MCC is a more direct method of measuring the imputation accuracy of genotypes (as opposed to haplotypes).
Table showing the residuals for the linear model testing for significant difference in the Matthew’s correlation coefficient between assigned haplogroups
| refpan_maf |
2 |
1.928274 |
0.9641368 |
123.0775 |
0 |
| Residuals |
304 |
2.381407 |
0.0078336 |
NA |
NA |
Table showing the estimated marginal means for the linear model testing for significant difference in the means of Matthew’s correlation coefficient for different Reference Panel minor allele frequency filtering thresholds
| MAF1% |
0.8667414 |
0.0088068 |
304 |
0.8494113 |
0.8840714 |
| MAF0.5% |
0.7584052 |
0.0087209 |
304 |
0.7412442 |
0.7755662 |
| MAF0.1% |
0.6726554 |
0.0087209 |
304 |
0.6554944 |
0.6898164 |
Table showing the contrasts for the linear model testing for significant difference in the means of Matthew’s correlation coefficient for different Reference Panel minor allele frequency filtering thresholds
| MAF1% - MAF0.5% |
0.1083362 |
0.0123941 |
304 |
8.740931 |
0 |
| MAF1% - MAF0.1% |
0.1940860 |
0.0123941 |
304 |
15.659517 |
0 |
| MAF0.5% - MAF0.1% |
0.0857498 |
0.0123332 |
304 |
6.952752 |
0 |
IMPUTE2 INFO Score
We are also reporting IMPUTE2’s INFO score. Here I will plot INFO scores for both the raw imputed data, and the imputed data after info score filtering
Table showing the residuals for the linear model testing for significant difference in the IMPUTE2 INFO Score between assigned haplogroups
| refpan_maf |
2 |
5.610955 |
2.8054777 |
132.0276 |
0 |
| Residuals |
304 |
6.459749 |
0.0212492 |
NA |
NA |
Table showing the estimated marginal means for the linear model testing for significant difference in the means ofIMPUTE2 INFO Score for different Reference Panel minor allele frequency filtering thresholds
| MAF1% |
0.7392964 |
0.0145048 |
304 |
0.7107539 |
0.7678388 |
| MAF0.5% |
0.6412761 |
0.0143632 |
304 |
0.6130121 |
0.6695401 |
| MAF0.1% |
0.4162709 |
0.0143632 |
304 |
0.3880069 |
0.4445348 |
Table showing the contrasts for the linear model testing for significant difference in the means of IMPUTE2 INFO Score for different Reference Panel minor allele frequency filtering thresholds
| MAF1% - MAF0.5% |
0.0980203 |
0.0204130 |
304 |
4.801855 |
7.4e-06 |
| MAF1% - MAF0.1% |
0.3230255 |
0.0204130 |
304 |
15.824500 |
0.0e+00 |
| MAF0.5% - MAF0.1% |
0.2250052 |
0.0203127 |
304 |
11.077078 |
0.0e+00 |
Table showing the residuals for the linear model testing for significant difference in the IMPUTE2 INFO Score (following filtering to info > 0.3) between assigned haplogroups
| refpan_maf |
2 |
1.097597 |
0.5487984 |
224.6923 |
0 |
| Residuals |
304 |
0.742503 |
0.0024424 |
NA |
NA |
Table showing the estimated marginal means for the linear model testing for significant difference in the means ofIMPUTE2 INFO Score (following filtering to info > 0.3) for different Reference Panel minor allele frequency filtering thresholds
| MAF1% |
0.8461870 |
0.0049176 |
304 |
0.8365101 |
0.8558638 |
| MAF0.5% |
0.7911693 |
0.0048696 |
304 |
0.7815869 |
0.8007517 |
| MAF0.1% |
0.7010144 |
0.0048696 |
304 |
0.6914320 |
0.7105968 |
Table showing the contrasts for the linear model testing for significant difference in the means of IMPUTE2 INFO Score (following filtering to info > 0.3) for different Reference Panel minor allele frequency filtering thresholds
| MAF1% - MAF0.5% |
0.0550177 |
0.0069207 |
304 |
7.94976 |
0 |
| MAF1% - MAF0.1% |
0.1451726 |
0.0069207 |
304 |
20.97667 |
0 |
| MAF0.5% - MAF0.1% |
0.0901549 |
0.0068867 |
304 |
13.09124 |
0 |
DERLETE LATER
Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.
DERLETE LATER 2
Number of included reference haplotypes (k_hap) experiments
This section will detail the Number of included reference haplotypes (k_hap) experiments.
## Rows: 1,161
## Columns: 71
## $ array <fct> BDCHP-1X10-HUMANHAP24…
## $ mcmc <chr> "kHAP100", "MCMC1", "…
## $ refpan_maf <ord> MAF1%, MAF1%, MAF1%, …
## $ k_hap <ord> kHAP100, kHAP100, kHA…
## $ imputed <lgl> TRUE, FALSE, FALSE, F…
## $ info_cutoff <dbl> 0.3, NA, NA, NA, NA, …
## $ n_snps_array <dbl> 309, NA, NA, NA, NA, …
## $ n_snps_imputed <dbl> 500, NA, NA, NA, NA, …
## $ n_snps_cutoff_imputed <dbl> 492, NA, NA, NA, NA, …
## $ n_type_0 <dbl> 198, NA, NA, NA, NA, …
## $ n_type_1 <dbl> 0, NA, NA, NA, NA, 0,…
## $ n_type_2 <dbl> 229, NA, NA, NA, NA, …
## $ n_type_3 <dbl> 73, NA, NA, NA, NA, 4…
## $ n_type_0_cutoff <dbl> 190, NA, NA, NA, NA, …
## $ n_type_1_cutoff <dbl> 0, NA, NA, NA, NA, 0,…
## $ n_type_2_cutoff <dbl> 229, NA, NA, NA, NA, …
## $ n_type_3_cutoff <dbl> 73, NA, NA, NA, NA, 4…
## $ mean_info <dbl> 0.8949340, NA, NA, NA…
## $ mean_info_cutoff <dbl> 0.9074289, NA, NA, NA…
## $ mean_maf <dbl> 0.06253400, NA, NA, N…
## $ mean_maf_cutoff <dbl> 0.06353455, NA, NA, N…
## $ mean_mcc <dbl> 0.8104934, NA, NA, NA…
## $ mean_mcc_cutoff <dbl> 0.8462349, NA, NA, NA…
## $ mean_concordance <dbl> 0.9949293, NA, NA, NA…
## $ mean_concordance_cutoff <dbl> 0.9949061, NA, NA, NA…
## $ mean_certainty <dbl> 0.9975062, NA, NA, NA…
## $ mean_certainty_cutoff <dbl> 0.9974895, NA, NA, NA…
## $ mean_himc_concordance_typed <dbl> 0.9806553, NA, NA, NA…
## $ mean_himc_concordance_typed_macro <dbl> 0.9936834, NA, NA, NA…
## $ mean_himc_concordance_imputed <dbl> 0.9893407, NA, NA, NA…
## $ mean_himc_concordance_imputed_cutoff <dbl> 0.9893407, NA, NA, NA…
## $ mean_himc_concordance_imputed_macro <dbl> 1.0000000, NA, NA, NA…
## $ mean_himc_concordance_imputed_macro_cutoff <dbl> 1.0000000, NA, NA, NA…
## $ mean_haplogrep_concordance_typed <dbl> 0.3062352, NA, NA, NA…
## $ mean_haplogrep_concordance_typed_macro <dbl> 0.9932912, NA, NA, NA…
## $ mean_haplogrep_concordance_imputed <dbl> 0.3026835, NA, NA, NA…
## $ mean_haplogrep_concordance_imputed_cutoff <dbl> 0.3026835, NA, NA, NA…
## $ mean_haplogrep_concordance_imputed_macro <dbl> 0.9952644, NA, NA, NA…
## $ mean_haplogrep_concordance_imputed_macro_cutoff <dbl> 0.9952644, NA, NA, NA…
## $ mean_haplogrep_quality_truth <dbl> 0.8560609, NA, NA, NA…
## $ mean_haplogrep_quality_typed <dbl> 0.9822484, NA, NA, NA…
## $ mean_haplogrep_quality_imputed <dbl> 0.9789768, NA, NA, NA…
## $ mean_haplogrep_quality_imputed_cutoff <dbl> 0.9790359, NA, NA, NA…
## $ mean_haplogrep_distance_dl_typed <dbl> 1.865430, NA, NA, NA,…
## $ mean_haplogrep_distance_dl_imputed <dbl> 2.093528, NA, NA, NA,…
## $ mean_haplogrep_distance_dl_imputed_cutoff <dbl> 2.094317, NA, NA, NA,…
## $ mean_haplogrep_distance_lv_typed <dbl> 1.865430, NA, NA, NA,…
## $ mean_haplogrep_distance_lv_imputed <dbl> 2.093528, NA, NA, NA,…
## $ mean_haplogrep_distance_lv_imputed_cutoff <dbl> 2.094317, NA, NA, NA,…
## $ mean_haplogrep_distance_jc_typed <dbl> 0.2800019, NA, NA, NA…
## $ mean_haplogrep_distance_jc_imputed <dbl> 0.2901451, NA, NA, NA…
## $ mean_haplogrep_distance_jc_imputed_cutoff <dbl> 0.2901677, NA, NA, NA…
## $ himc_diff <dbl> 0.008685353, NA, NA, …
## $ himc_cutoff_diff <dbl> 0.008685353, NA, NA, …
## $ himc_macro_diff <dbl> 0.006316621, NA, NA, …
## $ himc_macro_cutoff_diff <dbl> 0.006316621, NA, NA, …
## $ haplogrep_diff <dbl> -0.003551697, NA, NA,…
## $ haplogrep_cutoff_diff <dbl> -0.003551697, NA, NA,…
## $ haplogrep_macro_diff <dbl> 0.001973165, NA, NA, …
## $ haplogrep_macro_cutoff_diff <dbl> 0.001973165, NA, NA, …
## $ haplogrep_quality_diff <dbl> -0.003271665, NA, NA,…
## $ haplogrep_quality_cutoff_diff <dbl> -0.003212510, NA, NA,…
## $ haplogrep_quality_diff_truth_typed <dbl> -0.1261875, NA, NA, N…
## $ haplogrep_quality_diff_truth_imputed <dbl> -0.1229158, NA, NA, N…
## $ haplogrep_quality_diff_truth_imputed_cutoff <dbl> -0.1229750, NA, NA, N…
## $ haplogrep_distance_dl_diff <dbl> 0.2280979, NA, NA, NA…
## $ haplogrep_distance_dl_cutoff_diff <dbl> 0.2288871, NA, NA, NA…
## $ haplogrep_distance_lv_diff <dbl> 0.2280979, NA, NA, NA…
## $ haplogrep_distance_lv_cutoff_diff <dbl> 0.2288871, NA, NA, NA…
## $ haplogrep_distance_jc_diff <dbl> 0.010143189, NA, NA, …
## $ haplogrep_distance_jc_cutoff_diff <dbl> 0.0101657397, NA, NA,…
HiMC
HiMC Haplogrouping
We previously found that imputing missing variants increased the accuracy of haplogroup assignments when using HiMC to assign haplogroups.
Compare this result with the imputed data, which shows a higher haplogroup concordance:
If the improvement in accurate assignment of haplogroups wasn’t evident from the last two plots, displaying the mean difference should make this clear:
Table showing the residuals for the linear model testing for significant difference in the means of imputed haplogroup concordance
| k_hap |
8 |
20.34980 |
2.5437249 |
92.33707 |
0 |
| Residuals |
900 |
24.79343 |
0.0275483 |
NA |
NA |
Table showing the estimated marginal means for the linear model testing for significant difference in the means of imputed haplogroup concordance for different Reference Panel Number of included reference haplotypes (k_hap) filtering thresholds
| kHAP100 |
0.8671472 |
0.0165153 |
900 |
0.8347342 |
0.8995601 |
| kHAP250 |
0.8643340 |
0.0165153 |
900 |
0.8319211 |
0.8967470 |
| kHAP500 |
0.8601259 |
0.0165153 |
900 |
0.8277130 |
0.8925389 |
| kHAP1000 |
0.8486516 |
0.0165153 |
900 |
0.8162386 |
0.8810645 |
| kHAP2500 |
0.8119014 |
0.0165153 |
900 |
0.7794885 |
0.8443144 |
| kHAP5000 |
0.7220557 |
0.0165153 |
900 |
0.6896427 |
0.7544687 |
| kHAP10000 |
0.5948746 |
0.0165153 |
900 |
0.5624617 |
0.6272876 |
| kHAP20000 |
0.5225614 |
0.0165153 |
900 |
0.4901484 |
0.5549744 |
| kHAP30000 |
0.4739555 |
0.0165153 |
900 |
0.4415425 |
0.5063685 |
Table showing the contrasts for the linear model testing for significant difference in the means of imputed haplogroup concordance for different Reference Panel Number of included reference haplotypes (k_hap) filtering thresholds
| kHAP100 - kHAP250 |
0.0028131 |
0.0233562 |
900 |
0.1204443 |
1.0000000 |
| kHAP100 - kHAP500 |
0.0070212 |
0.0233562 |
900 |
0.3006153 |
0.9999981 |
| kHAP100 - kHAP1000 |
0.0184956 |
0.0233562 |
900 |
0.7918939 |
0.9970862 |
| kHAP100 - kHAP2500 |
0.0552457 |
0.0233562 |
900 |
2.3653606 |
0.3045080 |
| kHAP100 - kHAP5000 |
0.1450915 |
0.0233562 |
900 |
6.2121295 |
0.0000000 |
| kHAP100 - kHAP10000 |
0.2722725 |
0.0233562 |
900 |
11.6574208 |
0.0000000 |
| kHAP100 - kHAP20000 |
0.3445857 |
0.0233562 |
900 |
14.7535307 |
0.0000000 |
| kHAP100 - kHAP30000 |
0.3931917 |
0.0233562 |
900 |
16.8346063 |
0.0000000 |
| kHAP250 - kHAP500 |
0.0042081 |
0.0233562 |
900 |
0.1801711 |
1.0000000 |
| kHAP250 - kHAP1000 |
0.0156825 |
0.0233562 |
900 |
0.6714496 |
0.9991049 |
| kHAP250 - kHAP2500 |
0.0524326 |
0.0233562 |
900 |
2.2449163 |
0.3774111 |
| kHAP250 - kHAP5000 |
0.1422783 |
0.0233562 |
900 |
6.0916852 |
0.0000001 |
| kHAP250 - kHAP10000 |
0.2694594 |
0.0233562 |
900 |
11.5369765 |
0.0000000 |
| kHAP250 - kHAP20000 |
0.3417726 |
0.0233562 |
900 |
14.6330864 |
0.0000000 |
| kHAP250 - kHAP30000 |
0.3903785 |
0.0233562 |
900 |
16.7141620 |
0.0000000 |
| kHAP500 - kHAP1000 |
0.0114744 |
0.0233562 |
900 |
0.4912786 |
0.9999130 |
| kHAP500 - kHAP2500 |
0.0482245 |
0.0233562 |
900 |
2.0647453 |
0.4983509 |
| kHAP500 - kHAP5000 |
0.1380702 |
0.0233562 |
900 |
5.9115142 |
0.0000002 |
| kHAP500 - kHAP10000 |
0.2652513 |
0.0233562 |
900 |
11.3568055 |
0.0000000 |
| kHAP500 - kHAP20000 |
0.3375645 |
0.0233562 |
900 |
14.4529153 |
0.0000000 |
| kHAP500 - kHAP30000 |
0.3861704 |
0.0233562 |
900 |
16.5339909 |
0.0000000 |
| kHAP1000 - kHAP2500 |
0.0367501 |
0.0233562 |
900 |
1.5734667 |
0.8191720 |
| kHAP1000 - kHAP5000 |
0.1265959 |
0.0233562 |
900 |
5.4202356 |
0.0000027 |
| kHAP1000 - kHAP10000 |
0.2537769 |
0.0233562 |
900 |
10.8655269 |
0.0000000 |
| kHAP1000 - kHAP20000 |
0.3260901 |
0.0233562 |
900 |
13.9616368 |
0.0000000 |
| kHAP1000 - kHAP30000 |
0.3746961 |
0.0233562 |
900 |
16.0427124 |
0.0000000 |
| kHAP2500 - kHAP5000 |
0.0898457 |
0.0233562 |
900 |
3.8467689 |
0.0040590 |
| kHAP2500 - kHAP10000 |
0.2170268 |
0.0233562 |
900 |
9.2920602 |
0.0000000 |
| kHAP2500 - kHAP20000 |
0.2893400 |
0.0233562 |
900 |
12.3881701 |
0.0000000 |
| kHAP2500 - kHAP30000 |
0.3379459 |
0.0233562 |
900 |
14.4692457 |
0.0000000 |
| kHAP5000 - kHAP10000 |
0.1271811 |
0.0233562 |
900 |
5.4452913 |
0.0000024 |
| kHAP5000 - kHAP20000 |
0.1994943 |
0.0233562 |
900 |
8.5414012 |
0.0000000 |
| kHAP5000 - kHAP30000 |
0.2481002 |
0.0233562 |
900 |
10.6224768 |
0.0000000 |
| kHAP10000 - kHAP20000 |
0.0723132 |
0.0233562 |
900 |
3.0961099 |
0.0519824 |
| kHAP10000 - kHAP30000 |
0.1209191 |
0.0233562 |
900 |
5.1771855 |
0.0000098 |
| kHAP20000 - kHAP30000 |
0.0486059 |
0.0233562 |
900 |
2.0810756 |
0.4869961 |
Table showing the residuals for the linear model testing for significant difference in the mean concordance of assigned haplogroups between genotyped and imputed data
| k_hap |
8 |
20.34980 |
2.543725 |
159.3915 |
0 |
| Residuals |
900 |
14.36308 |
0.015959 |
NA |
NA |
Table showing the estimated marginal means for the linear model testing for significant difference in the the mean concordance of assigned haplogroups between genotyped and imputed data for different Reference Panel Number of included reference haplotypes (k_hap) filtering thresholds
| kHAP100 |
0.3173856 |
0.0125702 |
900 |
0.2927153 |
0.3420559 |
| kHAP250 |
0.3145725 |
0.0125702 |
900 |
0.2899022 |
0.3392428 |
| kHAP500 |
0.3103644 |
0.0125702 |
900 |
0.2856941 |
0.3350347 |
| kHAP1000 |
0.2988900 |
0.0125702 |
900 |
0.2742197 |
0.3235603 |
| kHAP2500 |
0.2621399 |
0.0125702 |
900 |
0.2374696 |
0.2868102 |
| kHAP5000 |
0.1722942 |
0.0125702 |
900 |
0.1476239 |
0.1969645 |
| kHAP10000 |
0.0451131 |
0.0125702 |
900 |
0.0204428 |
0.0697834 |
| kHAP20000 |
-0.0272001 |
0.0125702 |
900 |
-0.0518704 |
-0.0025298 |
| kHAP30000 |
-0.0758060 |
0.0125702 |
900 |
-0.1004763 |
-0.0511357 |
Table showing the contrasts for the linear model testing for significant difference in the the mean concordance of assigned haplogroups between genotyped and imputed data for different Reference Panel Number of included reference haplotypes (k_hap) filtering thresholds
| kHAP100 - kHAP250 |
0.0028131 |
0.0177769 |
900 |
0.1582452 |
1.0000000 |
| kHAP100 - kHAP500 |
0.0070212 |
0.0177769 |
900 |
0.3949623 |
0.9999837 |
| kHAP100 - kHAP1000 |
0.0184956 |
0.0177769 |
900 |
1.0404267 |
0.9818242 |
| kHAP100 - kHAP2500 |
0.0552457 |
0.0177769 |
900 |
3.1077199 |
0.0502388 |
| kHAP100 - kHAP5000 |
0.1450915 |
0.0177769 |
900 |
8.1617823 |
0.0000000 |
| kHAP100 - kHAP10000 |
0.2722725 |
0.0177769 |
900 |
15.3160573 |
0.0000000 |
| kHAP100 - kHAP20000 |
0.3445857 |
0.0177769 |
900 |
19.3838693 |
0.0000000 |
| kHAP100 - kHAP30000 |
0.3931917 |
0.0177769 |
900 |
22.1180825 |
0.0000000 |
| kHAP250 - kHAP500 |
0.0042081 |
0.0177769 |
900 |
0.2367170 |
0.9999997 |
| kHAP250 - kHAP1000 |
0.0156825 |
0.0177769 |
900 |
0.8821815 |
0.9938601 |
| kHAP250 - kHAP2500 |
0.0524326 |
0.0177769 |
900 |
2.9494747 |
0.0787565 |
| kHAP250 - kHAP5000 |
0.1422783 |
0.0177769 |
900 |
8.0035371 |
0.0000000 |
| kHAP250 - kHAP10000 |
0.2694594 |
0.0177769 |
900 |
15.1578121 |
0.0000000 |
| kHAP250 - kHAP20000 |
0.3417726 |
0.0177769 |
900 |
19.2256241 |
0.0000000 |
| kHAP250 - kHAP30000 |
0.3903785 |
0.0177769 |
900 |
21.9598372 |
0.0000000 |
| kHAP500 - kHAP1000 |
0.0114744 |
0.0177769 |
900 |
0.6454645 |
0.9993292 |
| kHAP500 - kHAP2500 |
0.0482245 |
0.0177769 |
900 |
2.7127576 |
0.1446827 |
| kHAP500 - kHAP5000 |
0.1380702 |
0.0177769 |
900 |
7.7668201 |
0.0000000 |
| kHAP500 - kHAP10000 |
0.2652513 |
0.0177769 |
900 |
14.9210950 |
0.0000000 |
| kHAP500 - kHAP20000 |
0.3375645 |
0.0177769 |
900 |
18.9889071 |
0.0000000 |
| kHAP500 - kHAP30000 |
0.3861704 |
0.0177769 |
900 |
21.7231202 |
0.0000000 |
| kHAP1000 - kHAP2500 |
0.0367501 |
0.0177769 |
900 |
2.0672932 |
0.4965760 |
| kHAP1000 - kHAP5000 |
0.1265959 |
0.0177769 |
900 |
7.1213556 |
0.0000000 |
| kHAP1000 - kHAP10000 |
0.2537769 |
0.0177769 |
900 |
14.2756306 |
0.0000000 |
| kHAP1000 - kHAP20000 |
0.3260901 |
0.0177769 |
900 |
18.3434426 |
0.0000000 |
| kHAP1000 - kHAP30000 |
0.3746961 |
0.0177769 |
900 |
21.0776557 |
0.0000000 |
| kHAP2500 - kHAP5000 |
0.0898457 |
0.0177769 |
900 |
5.0540624 |
0.0000185 |
| kHAP2500 - kHAP10000 |
0.2170268 |
0.0177769 |
900 |
12.2083374 |
0.0000000 |
| kHAP2500 - kHAP20000 |
0.2893400 |
0.0177769 |
900 |
16.2761494 |
0.0000000 |
| kHAP2500 - kHAP30000 |
0.3379459 |
0.0177769 |
900 |
19.0103626 |
0.0000000 |
| kHAP5000 - kHAP10000 |
0.1271811 |
0.0177769 |
900 |
7.1542750 |
0.0000000 |
| kHAP5000 - kHAP20000 |
0.1994943 |
0.0177769 |
900 |
11.2220870 |
0.0000000 |
| kHAP5000 - kHAP30000 |
0.2481002 |
0.0177769 |
900 |
13.9563001 |
0.0000000 |
| kHAP10000 - kHAP20000 |
0.0723132 |
0.0177769 |
900 |
4.0678120 |
0.0016920 |
| kHAP10000 - kHAP30000 |
0.1209191 |
0.0177769 |
900 |
6.8020252 |
0.0000000 |
| kHAP20000 - kHAP30000 |
0.0486059 |
0.0177769 |
900 |
2.7342131 |
0.1373727 |
HiMC Macrohaplogrouping
The trend of which can be further seen when only macro-haplogroups are considered:
Compare this result with the imputed data, which shows a higher haplogroup concordance:
If the improvement in accurate assignment of haplogroups wasn’t evident from the last two plots, displaying the mean difference should make this clear:
These can be statistically tested with linear models:
Table showing the residuals for the linear model testing for significant difference in the means of imputed macrohaplogroup concordance
| k_hap |
8 |
10.38166 |
1.2977077 |
46.20055 |
0 |
| Residuals |
900 |
25.27972 |
0.0280886 |
NA |
NA |
Table showing the estimated marginal means for the linear model testing for significant difference in the means of imputed macrohaplogroup concordance for different Reference Panel Number of included reference haplotypes (k_hap) filtering thresholds
| kHAP100 |
0.8992230 |
0.0166765 |
900 |
0.8664937 |
0.9319523 |
| kHAP250 |
0.8966250 |
0.0166765 |
900 |
0.8638957 |
0.9293543 |
| kHAP500 |
0.8937972 |
0.0166765 |
900 |
0.8610679 |
0.9265265 |
| kHAP1000 |
0.8863748 |
0.0166765 |
900 |
0.8536455 |
0.9191041 |
| kHAP2500 |
0.8616677 |
0.0166765 |
900 |
0.8289384 |
0.8943970 |
| kHAP5000 |
0.8112020 |
0.0166765 |
900 |
0.7784727 |
0.8439313 |
| kHAP10000 |
0.7110193 |
0.0166765 |
900 |
0.6782900 |
0.7437486 |
| kHAP20000 |
0.6477076 |
0.0166765 |
900 |
0.6149783 |
0.6804369 |
| kHAP30000 |
0.6200152 |
0.0166765 |
900 |
0.5872859 |
0.6527445 |
Table showing the contrasts for the linear model testing for significant difference in the means of imputed macrohaplogroup concordance for different Reference Panel Number of included reference haplotypes (k_hap) filtering thresholds
| kHAP100 - kHAP250 |
0.0025980 |
0.0235841 |
900 |
0.1101600 |
1.0000000 |
| kHAP100 - kHAP500 |
0.0054258 |
0.0235841 |
900 |
0.2300617 |
0.9999998 |
| kHAP100 - kHAP1000 |
0.0128482 |
0.0235841 |
900 |
0.5447840 |
0.9998098 |
| kHAP100 - kHAP2500 |
0.0375553 |
0.0235841 |
900 |
1.5924014 |
0.8090568 |
| kHAP100 - kHAP5000 |
0.0880210 |
0.0235841 |
900 |
3.7322197 |
0.0062535 |
| kHAP100 - kHAP10000 |
0.1882037 |
0.0235841 |
900 |
7.9801129 |
0.0000000 |
| kHAP100 - kHAP20000 |
0.2515154 |
0.0235841 |
900 |
10.6646214 |
0.0000000 |
| kHAP100 - kHAP30000 |
0.2792078 |
0.0235841 |
900 |
11.8388188 |
0.0000000 |
| kHAP250 - kHAP500 |
0.0028278 |
0.0235841 |
900 |
0.1199016 |
1.0000000 |
| kHAP250 - kHAP1000 |
0.0102502 |
0.0235841 |
900 |
0.4346239 |
0.9999659 |
| kHAP250 - kHAP2500 |
0.0349573 |
0.0235841 |
900 |
1.4822414 |
0.8638095 |
| kHAP250 - kHAP5000 |
0.0854230 |
0.0235841 |
900 |
3.6220597 |
0.0093438 |
| kHAP250 - kHAP10000 |
0.1856057 |
0.0235841 |
900 |
7.8699529 |
0.0000000 |
| kHAP250 - kHAP20000 |
0.2489174 |
0.0235841 |
900 |
10.5544613 |
0.0000000 |
| kHAP250 - kHAP30000 |
0.2766098 |
0.0235841 |
900 |
11.7286587 |
0.0000000 |
| kHAP500 - kHAP1000 |
0.0074224 |
0.0235841 |
900 |
0.3147223 |
0.9999972 |
| kHAP500 - kHAP2500 |
0.0321295 |
0.0235841 |
900 |
1.3623398 |
0.9115484 |
| kHAP500 - kHAP5000 |
0.0825952 |
0.0235841 |
900 |
3.5021580 |
0.0142352 |
| kHAP500 - kHAP10000 |
0.1827779 |
0.0235841 |
900 |
7.7500513 |
0.0000000 |
| kHAP500 - kHAP20000 |
0.2460896 |
0.0235841 |
900 |
10.4345597 |
0.0000000 |
| kHAP500 - kHAP30000 |
0.2737820 |
0.0235841 |
900 |
11.6087571 |
0.0000000 |
| kHAP1000 - kHAP2500 |
0.0247071 |
0.0235841 |
900 |
1.0476175 |
0.9810132 |
| kHAP1000 - kHAP5000 |
0.0751728 |
0.0235841 |
900 |
3.1874357 |
0.0395584 |
| kHAP1000 - kHAP10000 |
0.1753555 |
0.0235841 |
900 |
7.4353290 |
0.0000000 |
| kHAP1000 - kHAP20000 |
0.2386672 |
0.0235841 |
900 |
10.1198374 |
0.0000000 |
| kHAP1000 - kHAP30000 |
0.2663596 |
0.0235841 |
900 |
11.2940348 |
0.0000000 |
| kHAP2500 - kHAP5000 |
0.0504657 |
0.0235841 |
900 |
2.1398182 |
0.4466629 |
| kHAP2500 - kHAP10000 |
0.1506484 |
0.0235841 |
900 |
6.3877115 |
0.0000000 |
| kHAP2500 - kHAP20000 |
0.2139601 |
0.0235841 |
900 |
9.0722199 |
0.0000000 |
| kHAP2500 - kHAP30000 |
0.2416525 |
0.0235841 |
900 |
10.2464173 |
0.0000000 |
| kHAP5000 - kHAP10000 |
0.1001827 |
0.0235841 |
900 |
4.2478933 |
0.0007980 |
| kHAP5000 - kHAP20000 |
0.1634944 |
0.0235841 |
900 |
6.9324017 |
0.0000000 |
| kHAP5000 - kHAP30000 |
0.1911868 |
0.0235841 |
900 |
8.1065991 |
0.0000000 |
| kHAP10000 - kHAP20000 |
0.0633117 |
0.0235841 |
900 |
2.6845084 |
0.1547462 |
| kHAP10000 - kHAP30000 |
0.0910041 |
0.0235841 |
900 |
3.8587058 |
0.0038769 |
| kHAP20000 - kHAP30000 |
0.0276924 |
0.0235841 |
900 |
1.1741974 |
0.9617539 |
Table showing the residuals for the linear model testing for significant difference in the mean concordance of assigned macroaplogroups between genotyped and imputed data
| k_hap |
8 |
10.38166 |
1.2977077 |
57.74605 |
0 |
| Residuals |
900 |
20.22540 |
0.0224727 |
NA |
NA |
Table showing the estimated marginal means for the linear model testing for significant difference in the the mean concordance of assigned macrohaplogroups between genotyped and imputed data for different Reference Panel Number of included reference haplotypes (k_hap) filtering thresholds
| kHAP100 |
0.2417318 |
0.0149165 |
900 |
0.2124566 |
0.2710069 |
| kHAP250 |
0.2391337 |
0.0149165 |
900 |
0.2098586 |
0.2684089 |
| kHAP500 |
0.2363060 |
0.0149165 |
900 |
0.2070308 |
0.2655811 |
| kHAP1000 |
0.2288835 |
0.0149165 |
900 |
0.1996084 |
0.2581587 |
| kHAP2500 |
0.2041764 |
0.0149165 |
900 |
0.1749013 |
0.2334516 |
| kHAP5000 |
0.1537108 |
0.0149165 |
900 |
0.1244356 |
0.1829859 |
| kHAP10000 |
0.0535281 |
0.0149165 |
900 |
0.0242529 |
0.0828032 |
| kHAP20000 |
-0.0097836 |
0.0149165 |
900 |
-0.0390588 |
0.0194915 |
| kHAP30000 |
-0.0374760 |
0.0149165 |
900 |
-0.0667512 |
-0.0082009 |
Table showing the contrasts for the linear model testing for significant difference in the the mean concordance of assigned macrohaplogroups between genotyped and imputed data for different Reference Panel Number of included reference haplotypes (k_hap) filtering thresholds
| kHAP100 - kHAP250 |
0.0025980 |
0.0210951 |
900 |
0.1231577 |
1.0000000 |
| kHAP100 - kHAP500 |
0.0054258 |
0.0210951 |
900 |
0.2572064 |
0.9999994 |
| kHAP100 - kHAP1000 |
0.0128482 |
0.0210951 |
900 |
0.6090625 |
0.9995627 |
| kHAP100 - kHAP2500 |
0.0375553 |
0.0210951 |
900 |
1.7802873 |
0.6953597 |
| kHAP100 - kHAP5000 |
0.0880210 |
0.0210951 |
900 |
4.1725807 |
0.0010972 |
| kHAP100 - kHAP10000 |
0.1882037 |
0.0210951 |
900 |
8.9216787 |
0.0000000 |
| kHAP100 - kHAP20000 |
0.2515154 |
0.0210951 |
900 |
11.9229297 |
0.0000000 |
| kHAP100 - kHAP30000 |
0.2792078 |
0.0210951 |
900 |
13.2356695 |
0.0000000 |
| kHAP250 - kHAP500 |
0.0028278 |
0.0210951 |
900 |
0.1340487 |
1.0000000 |
| kHAP250 - kHAP1000 |
0.0102502 |
0.0210951 |
900 |
0.4859048 |
0.9999200 |
| kHAP250 - kHAP2500 |
0.0349573 |
0.0210951 |
900 |
1.6571296 |
0.7724479 |
| kHAP250 - kHAP5000 |
0.0854230 |
0.0210951 |
900 |
4.0494230 |
0.0018234 |
| kHAP250 - kHAP10000 |
0.1856057 |
0.0210951 |
900 |
8.7985210 |
0.0000000 |
| kHAP250 - kHAP20000 |
0.2489174 |
0.0210951 |
900 |
11.7997720 |
0.0000000 |
| kHAP250 - kHAP30000 |
0.2766098 |
0.0210951 |
900 |
13.1125118 |
0.0000000 |
| kHAP500 - kHAP1000 |
0.0074224 |
0.0210951 |
900 |
0.3518561 |
0.9999934 |
| kHAP500 - kHAP2500 |
0.0321295 |
0.0210951 |
900 |
1.5230809 |
0.8446867 |
| kHAP500 - kHAP5000 |
0.0825952 |
0.0210951 |
900 |
3.9153742 |
0.0031114 |
| kHAP500 - kHAP10000 |
0.1827779 |
0.0210951 |
900 |
8.6644723 |
0.0000000 |
| kHAP500 - kHAP20000 |
0.2460896 |
0.0210951 |
900 |
11.6657233 |
0.0000000 |
| kHAP500 - kHAP30000 |
0.2737820 |
0.0210951 |
900 |
12.9784631 |
0.0000000 |
| kHAP1000 - kHAP2500 |
0.0247071 |
0.0210951 |
900 |
1.1712248 |
0.9623259 |
| kHAP1000 - kHAP5000 |
0.0751728 |
0.0210951 |
900 |
3.5635182 |
0.0115004 |
| kHAP1000 - kHAP10000 |
0.1753555 |
0.0210951 |
900 |
8.3126162 |
0.0000000 |
| kHAP1000 - kHAP20000 |
0.2386672 |
0.0210951 |
900 |
11.3138672 |
0.0000000 |
| kHAP1000 - kHAP30000 |
0.2663596 |
0.0210951 |
900 |
12.6266070 |
0.0000000 |
| kHAP2500 - kHAP5000 |
0.0504657 |
0.0210951 |
900 |
2.3922933 |
0.2893352 |
| kHAP2500 - kHAP10000 |
0.1506484 |
0.0210951 |
900 |
7.1413914 |
0.0000000 |
| kHAP2500 - kHAP20000 |
0.2139601 |
0.0210951 |
900 |
10.1426424 |
0.0000000 |
| kHAP2500 - kHAP30000 |
0.2416525 |
0.0210951 |
900 |
11.4553822 |
0.0000000 |
| kHAP5000 - kHAP10000 |
0.1001827 |
0.0210951 |
900 |
4.7490981 |
0.0000828 |
| kHAP5000 - kHAP20000 |
0.1634944 |
0.0210951 |
900 |
7.7503490 |
0.0000000 |
| kHAP5000 - kHAP30000 |
0.1911868 |
0.0210951 |
900 |
9.0630889 |
0.0000000 |
| kHAP10000 - kHAP20000 |
0.0633117 |
0.0210951 |
900 |
3.0012510 |
0.0682356 |
| kHAP10000 - kHAP30000 |
0.0910041 |
0.0210951 |
900 |
4.3139908 |
0.0006005 |
| kHAP20000 - kHAP30000 |
0.0276924 |
0.0210951 |
900 |
1.3127398 |
0.9276140 |
These results suggest that there is no statistically significant difference in accurate assignment of haplogroups or macrohaplogroups between different Reference Panel Number of included reference haplotypes (k_hap) filtering thresholds.
HaploGrep 2.0
HaploGrep Haplogrouping
We are investigating using HaploGrep 2.0 for assigning haplogroups, as HaploGrep has a greater ability to assign haplogroups that cover all sub-groupings.
Compare this result with the imputed data, which shows a higher haplogroup concordance:
If the improvement in accurate assignment of haplogroups wasn’t evident from the last two plots, displaying the mean difference should make this clear:
Table showing the residuals for the linear model testing for significant difference in the means of imputed haplogroup concordance
| k_hap |
8 |
1.451465 |
0.1814331 |
22.2032 |
0 |
| Residuals |
900 |
7.354335 |
0.0081715 |
NA |
NA |
Table showing the estimated marginal means for the linear model testing for significant difference in the means of imputed haplogroup concordance for different Reference Panel Number of included reference haplotypes (k_hap) filtering thresholds
| kHAP100 |
0.1848797 |
0.0089948 |
900 |
0.1672265 |
0.2025328 |
| kHAP250 |
0.1800738 |
0.0089948 |
900 |
0.1624206 |
0.1977269 |
| kHAP500 |
0.1673830 |
0.0089948 |
900 |
0.1497298 |
0.1850362 |
| kHAP1000 |
0.1517032 |
0.0089948 |
900 |
0.1340500 |
0.1693563 |
| kHAP2500 |
0.1267280 |
0.0089948 |
900 |
0.1090748 |
0.1443811 |
| kHAP5000 |
0.1056522 |
0.0089948 |
900 |
0.0879991 |
0.1233054 |
| kHAP10000 |
0.0909062 |
0.0089948 |
900 |
0.0732531 |
0.1085594 |
| kHAP20000 |
0.0830683 |
0.0089948 |
900 |
0.0654151 |
0.1007214 |
| kHAP30000 |
0.0785320 |
0.0089948 |
900 |
0.0608788 |
0.0961851 |
Table showing the contrasts for the linear model testing for significant difference in the means of imputed haplogroup concordance for different Reference Panel Number of included reference haplotypes (k_hap) filtering thresholds
| kHAP100 - kHAP250 |
0.0048059 |
0.0127205 |
900 |
0.3778091 |
0.9999885 |
| kHAP100 - kHAP500 |
0.0174967 |
0.0127205 |
900 |
1.3754707 |
0.9069373 |
| kHAP100 - kHAP1000 |
0.0331765 |
0.0127205 |
900 |
2.6081111 |
0.1845241 |
| kHAP100 - kHAP2500 |
0.0581517 |
0.0127205 |
900 |
4.5714895 |
0.0001901 |
| kHAP100 - kHAP5000 |
0.0792275 |
0.0127205 |
900 |
6.2283204 |
0.0000000 |
| kHAP100 - kHAP10000 |
0.0939734 |
0.0127205 |
900 |
7.3875492 |
0.0000000 |
| kHAP100 - kHAP20000 |
0.1018114 |
0.0127205 |
900 |
8.0037158 |
0.0000000 |
| kHAP100 - kHAP30000 |
0.1063477 |
0.0127205 |
900 |
8.3603307 |
0.0000000 |
| kHAP250 - kHAP500 |
0.0126908 |
0.0127205 |
900 |
0.9976616 |
0.9861121 |
| kHAP250 - kHAP1000 |
0.0283706 |
0.0127205 |
900 |
2.2303021 |
0.3867648 |
| kHAP250 - kHAP2500 |
0.0533458 |
0.0127205 |
900 |
4.1936805 |
0.0010042 |
| kHAP250 - kHAP5000 |
0.0744215 |
0.0127205 |
900 |
5.8505114 |
0.0000002 |
| kHAP250 - kHAP10000 |
0.0891675 |
0.0127205 |
900 |
7.0097401 |
0.0000000 |
| kHAP250 - kHAP20000 |
0.0970055 |
0.0127205 |
900 |
7.6259068 |
0.0000000 |
| kHAP250 - kHAP30000 |
0.1015418 |
0.0127205 |
900 |
7.9825217 |
0.0000000 |
| kHAP500 - kHAP1000 |
0.0156798 |
0.0127205 |
900 |
1.2326404 |
0.9491916 |
| kHAP500 - kHAP2500 |
0.0406550 |
0.0127205 |
900 |
3.1960188 |
0.0385344 |
| kHAP500 - kHAP5000 |
0.0617308 |
0.0127205 |
900 |
4.8528498 |
0.0000502 |
| kHAP500 - kHAP10000 |
0.0764767 |
0.0127205 |
900 |
6.0120785 |
0.0000001 |
| kHAP500 - kHAP20000 |
0.0843147 |
0.0127205 |
900 |
6.6282451 |
0.0000000 |
| kHAP500 - kHAP30000 |
0.0888510 |
0.0127205 |
900 |
6.9848600 |
0.0000000 |
| kHAP1000 - kHAP2500 |
0.0249752 |
0.0127205 |
900 |
1.9633784 |
0.5695169 |
| kHAP1000 - kHAP5000 |
0.0460509 |
0.0127205 |
900 |
3.6202093 |
0.0094059 |
| kHAP1000 - kHAP10000 |
0.0607969 |
0.0127205 |
900 |
4.7794381 |
0.0000716 |
| kHAP1000 - kHAP20000 |
0.0686349 |
0.0127205 |
900 |
5.3956047 |
0.0000031 |
| kHAP1000 - kHAP30000 |
0.0731712 |
0.0127205 |
900 |
5.7522196 |
0.0000004 |
| kHAP2500 - kHAP5000 |
0.0210757 |
0.0127205 |
900 |
1.6568309 |
0.7726237 |
| kHAP2500 - kHAP10000 |
0.0358217 |
0.0127205 |
900 |
2.8160597 |
0.1120397 |
| kHAP2500 - kHAP20000 |
0.0436597 |
0.0127205 |
900 |
3.4322263 |
0.0180539 |
| kHAP2500 - kHAP30000 |
0.0481960 |
0.0127205 |
900 |
3.7888412 |
0.0050599 |
| kHAP5000 - kHAP10000 |
0.0147460 |
0.0127205 |
900 |
1.1592287 |
0.9645710 |
| kHAP5000 - kHAP20000 |
0.0225839 |
0.0127205 |
900 |
1.7753954 |
0.6985758 |
| kHAP5000 - kHAP30000 |
0.0271203 |
0.0127205 |
900 |
2.1320103 |
0.4519694 |
| kHAP10000 - kHAP20000 |
0.0078380 |
0.0127205 |
900 |
0.6161666 |
0.9995235 |
| kHAP10000 - kHAP30000 |
0.0123743 |
0.0127205 |
900 |
0.9727815 |
0.9882167 |
| kHAP20000 - kHAP30000 |
0.0045363 |
0.0127205 |
900 |
0.3566149 |
0.9999926 |
Table showing the residuals for the linear model testing for significant difference in the mean concordance of assigned haplogroups between genotyped and imputed data
| k_hap |
8 |
1.451465 |
0.1814331 |
153.1917 |
0 |
| Residuals |
900 |
1.065918 |
0.0011844 |
NA |
NA |
Table showing the estimated marginal means for the linear model testing for significant difference in the the mean concordance of assigned haplogroups between genotyped and imputed data for different Reference Panel Number of included reference haplotypes (k_hap) filtering thresholds
| kHAP100 |
-0.0219979 |
0.0034244 |
900 |
-0.0287185 |
-0.0152772 |
| kHAP250 |
-0.0268038 |
0.0034244 |
900 |
-0.0335245 |
-0.0200831 |
| kHAP500 |
-0.0394946 |
0.0034244 |
900 |
-0.0462152 |
-0.0327739 |
| kHAP1000 |
-0.0551744 |
0.0034244 |
900 |
-0.0618951 |
-0.0484537 |
| kHAP2500 |
-0.0801496 |
0.0034244 |
900 |
-0.0868702 |
-0.0734289 |
| kHAP5000 |
-0.1012253 |
0.0034244 |
900 |
-0.1079460 |
-0.0945046 |
| kHAP10000 |
-0.1159713 |
0.0034244 |
900 |
-0.1226920 |
-0.1092506 |
| kHAP20000 |
-0.1238093 |
0.0034244 |
900 |
-0.1305299 |
-0.1170886 |
| kHAP30000 |
-0.1283456 |
0.0034244 |
900 |
-0.1350663 |
-0.1216249 |
Table showing the contrasts for the linear model testing for significant difference in the the mean concordance of assigned haplogroups between genotyped and imputed data for different Reference Panel Number of included reference haplotypes (k_hap) filtering thresholds
| kHAP100 - kHAP250 |
0.0048059 |
0.0048428 |
900 |
0.9923895 |
0.9865808 |
| kHAP100 - kHAP500 |
0.0174967 |
0.0048428 |
900 |
3.6129431 |
0.0096534 |
| kHAP100 - kHAP1000 |
0.0331765 |
0.0048428 |
900 |
6.8507146 |
0.0000000 |
| kHAP100 - kHAP2500 |
0.0581517 |
0.0048428 |
900 |
12.0079126 |
0.0000000 |
| kHAP100 - kHAP5000 |
0.0792275 |
0.0048428 |
900 |
16.3599034 |
0.0000000 |
| kHAP100 - kHAP10000 |
0.0939734 |
0.0048428 |
900 |
19.4048448 |
0.0000000 |
| kHAP100 - kHAP20000 |
0.1018114 |
0.0048428 |
900 |
21.0233271 |
0.0000000 |
| kHAP100 - kHAP30000 |
0.1063477 |
0.0048428 |
900 |
21.9600460 |
0.0000000 |
| kHAP250 - kHAP500 |
0.0126908 |
0.0048428 |
900 |
2.6205537 |
0.1794155 |
| kHAP250 - kHAP1000 |
0.0283706 |
0.0048428 |
900 |
5.8583251 |
0.0000002 |
| kHAP250 - kHAP2500 |
0.0533458 |
0.0048428 |
900 |
11.0155231 |
0.0000000 |
| kHAP250 - kHAP5000 |
0.0744215 |
0.0048428 |
900 |
15.3675140 |
0.0000000 |
| kHAP250 - kHAP10000 |
0.0891675 |
0.0048428 |
900 |
18.4124553 |
0.0000000 |
| kHAP250 - kHAP20000 |
0.0970055 |
0.0048428 |
900 |
20.0309377 |
0.0000000 |
| kHAP250 - kHAP30000 |
0.1015418 |
0.0048428 |
900 |
20.9676565 |
0.0000000 |
| kHAP500 - kHAP1000 |
0.0156798 |
0.0048428 |
900 |
3.2377715 |
0.0338735 |
| kHAP500 - kHAP2500 |
0.0406550 |
0.0048428 |
900 |
8.3949694 |
0.0000000 |
| kHAP500 - kHAP5000 |
0.0617308 |
0.0048428 |
900 |
12.7469603 |
0.0000000 |
| kHAP500 - kHAP10000 |
0.0764767 |
0.0048428 |
900 |
15.7919017 |
0.0000000 |
| kHAP500 - kHAP20000 |
0.0843147 |
0.0048428 |
900 |
17.4103840 |
0.0000000 |
| kHAP500 - kHAP30000 |
0.0888510 |
0.0048428 |
900 |
18.3471028 |
0.0000000 |
| kHAP1000 - kHAP2500 |
0.0249752 |
0.0048428 |
900 |
5.1571979 |
0.0000109 |
| kHAP1000 - kHAP5000 |
0.0460509 |
0.0048428 |
900 |
9.5091888 |
0.0000000 |
| kHAP1000 - kHAP10000 |
0.0607969 |
0.0048428 |
900 |
12.5541302 |
0.0000000 |
| kHAP1000 - kHAP20000 |
0.0686349 |
0.0048428 |
900 |
14.1726125 |
0.0000000 |
| kHAP1000 - kHAP30000 |
0.0731712 |
0.0048428 |
900 |
15.1093313 |
0.0000000 |
| kHAP2500 - kHAP5000 |
0.0210757 |
0.0048428 |
900 |
4.3519909 |
0.0005089 |
| kHAP2500 - kHAP10000 |
0.0358217 |
0.0048428 |
900 |
7.3969322 |
0.0000000 |
| kHAP2500 - kHAP20000 |
0.0436597 |
0.0048428 |
900 |
9.0154146 |
0.0000000 |
| kHAP2500 - kHAP30000 |
0.0481960 |
0.0048428 |
900 |
9.9521334 |
0.0000000 |
| kHAP5000 - kHAP10000 |
0.0147460 |
0.0048428 |
900 |
3.0449413 |
0.0602891 |
| kHAP5000 - kHAP20000 |
0.0225839 |
0.0048428 |
900 |
4.6634237 |
0.0001241 |
| kHAP5000 - kHAP30000 |
0.0271203 |
0.0048428 |
900 |
5.6001425 |
0.0000010 |
| kHAP10000 - kHAP20000 |
0.0078380 |
0.0048428 |
900 |
1.6184823 |
0.7946737 |
| kHAP10000 - kHAP30000 |
0.0123743 |
0.0048428 |
900 |
2.5552012 |
0.2073877 |
| kHAP20000 - kHAP30000 |
0.0045363 |
0.0048428 |
900 |
0.9367188 |
0.9908144 |
HaploGrep Macrohaplogrouping
The trend of which can be further seen when only macro-haplogroups are considered:
Compare this result with the imputed data, which shows a higher haplogroup concordance:
If the improvement in accurate assignment of haplogroups wasn’t evident from the last two plots, displaying the mean difference should make this clear:
These can be statistically tested with linear models:
Table showing the residuals for the linear model testing for significant difference in the means of imputed macrohaplogroup concordance
| k_hap |
8 |
9.947415 |
1.2434269 |
41.6241 |
0 |
| Residuals |
900 |
26.885486 |
0.0298728 |
NA |
NA |
Table showing the estimated marginal means for the linear model testing for significant difference in the means of imputed macrohaplogroup concordance for different Reference Panel Number of included reference haplotypes (k_hap) filtering thresholds
| kHAP100 |
0.8890144 |
0.017198 |
900 |
0.8552616 |
0.9227671 |
| kHAP250 |
0.8868615 |
0.017198 |
900 |
0.8531087 |
0.9206142 |
| kHAP500 |
0.8840795 |
0.017198 |
900 |
0.8503267 |
0.9178323 |
| kHAP1000 |
0.8789454 |
0.017198 |
900 |
0.8451926 |
0.9126981 |
| kHAP2500 |
0.8538139 |
0.017198 |
900 |
0.8200611 |
0.8875667 |
| kHAP5000 |
0.8002649 |
0.017198 |
900 |
0.7665121 |
0.8340177 |
| kHAP10000 |
0.7017161 |
0.017198 |
900 |
0.6679633 |
0.7354688 |
| kHAP20000 |
0.6522580 |
0.017198 |
900 |
0.6185052 |
0.6860108 |
| kHAP30000 |
0.6115131 |
0.017198 |
900 |
0.5777603 |
0.6452659 |
Table showing the contrasts for the linear model testing for significant difference in the means of imputed macrohaplogroup concordance for different Reference Panel Number of included reference haplotypes (k_hap) filtering thresholds
| kHAP100 - kHAP250 |
0.0021529 |
0.0243216 |
900 |
0.0885180 |
1.0000000 |
| kHAP100 - kHAP500 |
0.0049349 |
0.0243216 |
900 |
0.2029006 |
0.9999999 |
| kHAP100 - kHAP1000 |
0.0100690 |
0.0243216 |
900 |
0.4139944 |
0.9999766 |
| kHAP100 - kHAP2500 |
0.0352005 |
0.0243216 |
900 |
1.4472935 |
0.8790296 |
| kHAP100 - kHAP5000 |
0.0887494 |
0.0243216 |
900 |
3.6489981 |
0.0084808 |
| kHAP100 - kHAP10000 |
0.1872983 |
0.0243216 |
900 |
7.7009056 |
0.0000000 |
| kHAP100 - kHAP20000 |
0.2367564 |
0.0243216 |
900 |
9.7344100 |
0.0000000 |
| kHAP100 - kHAP30000 |
0.2775012 |
0.0243216 |
900 |
11.4096654 |
0.0000000 |
| kHAP250 - kHAP500 |
0.0027820 |
0.0243216 |
900 |
0.1143826 |
1.0000000 |
| kHAP250 - kHAP1000 |
0.0079161 |
0.0243216 |
900 |
0.3254764 |
0.9999964 |
| kHAP250 - kHAP2500 |
0.0330476 |
0.0243216 |
900 |
1.3587754 |
0.9127740 |
| kHAP250 - kHAP5000 |
0.0865965 |
0.0243216 |
900 |
3.5604801 |
0.0116238 |
| kHAP250 - kHAP10000 |
0.1851454 |
0.0243216 |
900 |
7.6123876 |
0.0000000 |
| kHAP250 - kHAP20000 |
0.2346035 |
0.0243216 |
900 |
9.6458920 |
0.0000000 |
| kHAP250 - kHAP30000 |
0.2753483 |
0.0243216 |
900 |
11.3211474 |
0.0000000 |
| kHAP500 - kHAP1000 |
0.0051341 |
0.0243216 |
900 |
0.2110938 |
0.9999999 |
| kHAP500 - kHAP2500 |
0.0302656 |
0.0243216 |
900 |
1.2443928 |
0.9463509 |
| kHAP500 - kHAP5000 |
0.0838146 |
0.0243216 |
900 |
3.4460974 |
0.0172307 |
| kHAP500 - kHAP10000 |
0.1823634 |
0.0243216 |
900 |
7.4980050 |
0.0000000 |
| kHAP500 - kHAP20000 |
0.2318215 |
0.0243216 |
900 |
9.5315094 |
0.0000000 |
| kHAP500 - kHAP30000 |
0.2725664 |
0.0243216 |
900 |
11.2067648 |
0.0000000 |
| kHAP1000 - kHAP2500 |
0.0251315 |
0.0243216 |
900 |
1.0332991 |
0.9826016 |
| kHAP1000 - kHAP5000 |
0.0786804 |
0.0243216 |
900 |
3.2350037 |
0.0341666 |
| kHAP1000 - kHAP10000 |
0.1772293 |
0.0243216 |
900 |
7.2869113 |
0.0000000 |
| kHAP1000 - kHAP20000 |
0.2266873 |
0.0243216 |
900 |
9.3204156 |
0.0000000 |
| kHAP1000 - kHAP30000 |
0.2674322 |
0.0243216 |
900 |
10.9956710 |
0.0000000 |
| kHAP2500 - kHAP5000 |
0.0535490 |
0.0243216 |
900 |
2.2017046 |
0.4053450 |
| kHAP2500 - kHAP10000 |
0.1520978 |
0.0243216 |
900 |
6.2536122 |
0.0000000 |
| kHAP2500 - kHAP20000 |
0.2015559 |
0.0243216 |
900 |
8.2871165 |
0.0000000 |
| kHAP2500 - kHAP30000 |
0.2423007 |
0.0243216 |
900 |
9.9623719 |
0.0000000 |
| kHAP5000 - kHAP10000 |
0.0985488 |
0.0243216 |
900 |
4.0519076 |
0.0018051 |
| kHAP5000 - kHAP20000 |
0.1480069 |
0.0243216 |
900 |
6.0854119 |
0.0000001 |
| kHAP5000 - kHAP30000 |
0.1887518 |
0.0243216 |
900 |
7.7606673 |
0.0000000 |
| kHAP10000 - kHAP20000 |
0.0494581 |
0.0243216 |
900 |
2.0335043 |
0.5201945 |
| kHAP10000 - kHAP30000 |
0.0902029 |
0.0243216 |
900 |
3.7087598 |
0.0068198 |
| kHAP20000 - kHAP30000 |
0.0407449 |
0.0243216 |
900 |
1.6752554 |
0.7616696 |
Table showing the residuals for the linear model testing for significant difference in the mean concordance of assigned macroaplogroups between genotyped and imputed data
| k_hap |
8 |
9.947415 |
1.2434269 |
318.3801 |
0 |
| Residuals |
900 |
3.514932 |
0.0039055 |
NA |
NA |
Table showing the estimated marginal means for the linear model testing for significant difference in the the mean concordance of assigned macrohaplogroups between genotyped and imputed data for different Reference Panel Number of included reference haplotypes (k_hap) filtering thresholds
| kHAP100 |
0.0068846 |
0.0062184 |
900 |
-0.0053196 |
0.0190888 |
| kHAP250 |
0.0047317 |
0.0062184 |
900 |
-0.0074725 |
0.0169359 |
| kHAP500 |
0.0019497 |
0.0062184 |
900 |
-0.0102545 |
0.0141539 |
| kHAP1000 |
-0.0031844 |
0.0062184 |
900 |
-0.0153886 |
0.0090198 |
| kHAP2500 |
-0.0283159 |
0.0062184 |
900 |
-0.0405201 |
-0.0161117 |
| kHAP5000 |
-0.0818649 |
0.0062184 |
900 |
-0.0940690 |
-0.0696607 |
| kHAP10000 |
-0.1804137 |
0.0062184 |
900 |
-0.1926179 |
-0.1682095 |
| kHAP20000 |
-0.2298718 |
0.0062184 |
900 |
-0.2420760 |
-0.2176676 |
| kHAP30000 |
-0.2706166 |
0.0062184 |
900 |
-0.2828208 |
-0.2584125 |
Table showing the contrasts for the linear model testing for significant difference in the the mean concordance of assigned macrohaplogroups between genotyped and imputed data for different Reference Panel Number of included reference haplotypes (k_hap) filtering thresholds
| kHAP100 - kHAP250 |
0.0021529 |
0.0087941 |
900 |
0.2448117 |
0.9999996 |
| kHAP100 - kHAP500 |
0.0049349 |
0.0087941 |
900 |
0.5611563 |
0.9997625 |
| kHAP100 - kHAP1000 |
0.0100690 |
0.0087941 |
900 |
1.1449722 |
0.9671100 |
| kHAP100 - kHAP2500 |
0.0352005 |
0.0087941 |
900 |
4.0027374 |
0.0022012 |
| kHAP100 - kHAP5000 |
0.0887494 |
0.0087941 |
900 |
10.0919279 |
0.0000000 |
| kHAP100 - kHAP10000 |
0.1872983 |
0.0087941 |
900 |
21.2981709 |
0.0000000 |
| kHAP100 - kHAP20000 |
0.2367564 |
0.0087941 |
900 |
26.9221748 |
0.0000000 |
| kHAP100 - kHAP30000 |
0.2775012 |
0.0087941 |
900 |
31.5553800 |
0.0000000 |
| kHAP250 - kHAP500 |
0.0027820 |
0.0087941 |
900 |
0.3163447 |
0.9999971 |
| kHAP250 - kHAP1000 |
0.0079161 |
0.0087941 |
900 |
0.9001605 |
0.9929625 |
| kHAP250 - kHAP2500 |
0.0330476 |
0.0087941 |
900 |
3.7579258 |
0.0056827 |
| kHAP250 - kHAP5000 |
0.0865965 |
0.0087941 |
900 |
9.8471162 |
0.0000000 |
| kHAP250 - kHAP10000 |
0.1851454 |
0.0087941 |
900 |
21.0533593 |
0.0000000 |
| kHAP250 - kHAP20000 |
0.2346035 |
0.0087941 |
900 |
26.6773631 |
0.0000000 |
| kHAP250 - kHAP30000 |
0.2753483 |
0.0087941 |
900 |
31.3105683 |
0.0000000 |
| kHAP500 - kHAP1000 |
0.0051341 |
0.0087941 |
900 |
0.5838159 |
0.9996807 |
| kHAP500 - kHAP2500 |
0.0302656 |
0.0087941 |
900 |
3.4415811 |
0.0174949 |
| kHAP500 - kHAP5000 |
0.0838146 |
0.0087941 |
900 |
9.5307715 |
0.0000000 |
| kHAP500 - kHAP10000 |
0.1823634 |
0.0087941 |
900 |
20.7370146 |
0.0000000 |
| kHAP500 - kHAP20000 |
0.2318215 |
0.0087941 |
900 |
26.3610184 |
0.0000000 |
| kHAP500 - kHAP30000 |
0.2725664 |
0.0087941 |
900 |
30.9942237 |
0.0000000 |
| kHAP1000 - kHAP2500 |
0.0251315 |
0.0087941 |
900 |
2.8577653 |
0.1006177 |
| kHAP1000 - kHAP5000 |
0.0786804 |
0.0087941 |
900 |
8.9469557 |
0.0000000 |
| kHAP1000 - kHAP10000 |
0.1772293 |
0.0087941 |
900 |
20.1531987 |
0.0000000 |
| kHAP1000 - kHAP20000 |
0.2266873 |
0.0087941 |
900 |
25.7772026 |
0.0000000 |
| kHAP1000 - kHAP30000 |
0.2674322 |
0.0087941 |
900 |
30.4104078 |
0.0000000 |
| kHAP2500 - kHAP5000 |
0.0535490 |
0.0087941 |
900 |
6.0891904 |
0.0000001 |
| kHAP2500 - kHAP10000 |
0.1520978 |
0.0087941 |
900 |
17.2954335 |
0.0000000 |
| kHAP2500 - kHAP20000 |
0.2015559 |
0.0087941 |
900 |
22.9194373 |
0.0000000 |
| kHAP2500 - kHAP30000 |
0.2423007 |
0.0087941 |
900 |
27.5526426 |
0.0000000 |
| kHAP5000 - kHAP10000 |
0.0985488 |
0.0087941 |
900 |
11.2062431 |
0.0000000 |
| kHAP5000 - kHAP20000 |
0.1480069 |
0.0087941 |
900 |
16.8302469 |
0.0000000 |
| kHAP5000 - kHAP30000 |
0.1887518 |
0.0087941 |
900 |
21.4634521 |
0.0000000 |
| kHAP10000 - kHAP20000 |
0.0494581 |
0.0087941 |
900 |
5.6240038 |
0.0000009 |
| kHAP10000 - kHAP30000 |
0.0902029 |
0.0087941 |
900 |
10.2572091 |
0.0000000 |
| kHAP20000 - kHAP30000 |
0.0407449 |
0.0087941 |
900 |
4.6332052 |
0.0001429 |
HaploGrep Haplogrouping (with info > 0.3 cutoff)
It should be noted that, by convention, imputed variants with an IMPUTE2 info score of info <= 0.3 are excluded from the final datasets. As such, I have also displayed these results where I have excluded any imputed sites within an info score info <= 0.3.
Imputed haplogroup corcordance, :
Difference in haplogroup concordance between genotyped and imputed datasets with (cutoff info <= 0.3):
Table showing the residuals for the linear model testing for significant difference in the means of imputed haplogroup concordance
| k_hap |
8 |
0.1359959 |
0.0169995 |
1.807766 |
0.0720361 |
| Residuals |
900 |
8.4632264 |
0.0094036 |
NA |
NA |
Table showing the estimated marginal means for the linear model testing for significant difference in the means of imputed haplogroup concordance for different Reference Panel Number of included reference haplotypes (k_hap) filtering thresholds
| kHAP100 |
0.1921550 |
0.0096491 |
900 |
0.1732177 |
0.2110923 |
| kHAP250 |
0.1912446 |
0.0096491 |
900 |
0.1723073 |
0.2101819 |
| kHAP500 |
0.1821798 |
0.0096491 |
900 |
0.1632425 |
0.2011171 |
| kHAP1000 |
0.1722632 |
0.0096491 |
900 |
0.1533258 |
0.1912005 |
| kHAP2500 |
0.1650621 |
0.0096491 |
900 |
0.1461248 |
0.1839994 |
| kHAP5000 |
0.1617448 |
0.0096491 |
900 |
0.1428075 |
0.1806821 |
| kHAP10000 |
0.1754163 |
0.0096491 |
900 |
0.1564790 |
0.1943536 |
| kHAP20000 |
0.1834067 |
0.0096491 |
900 |
0.1644693 |
0.2023440 |
| kHAP30000 |
0.2005048 |
0.0096491 |
900 |
0.1815675 |
0.2194421 |
Table showing the contrasts for the linear model testing for significant difference in the means of imputed haplogroup concordance for different Reference Panel Number of included reference haplotypes (k_hap) filtering thresholds
| kHAP100 - kHAP250 |
0.0009104 |
0.0136459 |
900 |
0.0667155 |
1.0000000 |
| kHAP100 - kHAP500 |
0.0099752 |
0.0136459 |
900 |
0.7310074 |
0.9983468 |
| kHAP100 - kHAP1000 |
0.0198918 |
0.0136459 |
900 |
1.4577199 |
0.8746005 |
| kHAP100 - kHAP2500 |
0.0270929 |
0.0136459 |
900 |
1.9854311 |
0.5539929 |
| kHAP100 - kHAP5000 |
0.0304102 |
0.0136459 |
900 |
2.2285276 |
0.3879072 |
| kHAP100 - kHAP10000 |
0.0167387 |
0.0136459 |
900 |
1.2266494 |
0.9505981 |
| kHAP100 - kHAP20000 |
0.0087483 |
0.0136459 |
900 |
0.6410990 |
0.9993618 |
| kHAP100 - kHAP30000 |
-0.0083498 |
0.0136459 |
900 |
-0.6118930 |
0.9995474 |
| kHAP250 - kHAP500 |
0.0090648 |
0.0136459 |
900 |
0.6642919 |
0.9991721 |
| kHAP250 - kHAP1000 |
0.0189815 |
0.0136459 |
900 |
1.3910044 |
0.9012870 |
| kHAP250 - kHAP2500 |
0.0261825 |
0.0136459 |
900 |
1.9187156 |
0.6008699 |
| kHAP250 - kHAP5000 |
0.0294998 |
0.0136459 |
900 |
2.1618120 |
0.4318207 |
| kHAP250 - kHAP10000 |
0.0158283 |
0.0136459 |
900 |
1.1599339 |
0.9644418 |
| kHAP250 - kHAP20000 |
0.0078380 |
0.0136459 |
900 |
0.5743834 |
0.9997172 |
| kHAP250 - kHAP30000 |
-0.0092602 |
0.0136459 |
900 |
-0.6786085 |
0.9990331 |
| kHAP500 - kHAP1000 |
0.0099166 |
0.0136459 |
900 |
0.7267124 |
0.9984151 |
| kHAP500 - kHAP2500 |
0.0171177 |
0.0136459 |
900 |
1.2544236 |
0.9438398 |
| kHAP500 - kHAP5000 |
0.0204350 |
0.0136459 |
900 |
1.4975201 |
0.8568223 |
| kHAP500 - kHAP10000 |
0.0067635 |
0.0136459 |
900 |
0.4956419 |
0.9999070 |
| kHAP500 - kHAP20000 |
-0.0012269 |
0.0136459 |
900 |
-0.0899085 |
1.0000000 |
| kHAP500 - kHAP30000 |
-0.0183250 |
0.0136459 |
900 |
-1.3429005 |
0.9180982 |
| kHAP1000 - kHAP2500 |
0.0072011 |
0.0136459 |
900 |
0.5277112 |
0.9998504 |
| kHAP1000 - kHAP5000 |
0.0105183 |
0.0136459 |
900 |
0.7708077 |
0.9975902 |
| kHAP1000 - kHAP10000 |
-0.0031532 |
0.0136459 |
900 |
-0.2310705 |
0.9999998 |
| kHAP1000 - kHAP20000 |
-0.0111435 |
0.0136459 |
900 |
-0.8166209 |
0.9963884 |
| kHAP1000 - kHAP30000 |
-0.0282417 |
0.0136459 |
900 |
-2.0696129 |
0.4949611 |
| kHAP2500 - kHAP5000 |
0.0033173 |
0.0136459 |
900 |
0.2430965 |
0.9999996 |
| kHAP2500 - kHAP10000 |
-0.0103542 |
0.0136459 |
900 |
-0.7587817 |
0.9978440 |
| kHAP2500 - kHAP20000 |
-0.0183446 |
0.0136459 |
900 |
-1.3443321 |
0.9176271 |
| kHAP2500 - kHAP30000 |
-0.0354427 |
0.0136459 |
900 |
-2.5973241 |
0.1890354 |
| kHAP5000 - kHAP10000 |
-0.0136715 |
0.0136459 |
900 |
-1.0018782 |
0.9857282 |
| kHAP5000 - kHAP20000 |
-0.0216618 |
0.0136459 |
900 |
-1.5874286 |
0.8117404 |
| kHAP5000 - kHAP30000 |
-0.0387600 |
0.0136459 |
900 |
-2.8404206 |
0.1052504 |
| kHAP10000 - kHAP20000 |
-0.0079903 |
0.0136459 |
900 |
-0.5855504 |
0.9996736 |
| kHAP10000 - kHAP30000 |
-0.0250885 |
0.0136459 |
900 |
-1.8385424 |
0.6563077 |
| kHAP20000 - kHAP30000 |
-0.0170982 |
0.0136459 |
900 |
-1.2529920 |
0.9442031 |
Table showing the residuals for the linear model testing for significant difference in the mean concordance of assigned haplogroups between genotyped and imputed data
| k_hap |
8 |
0.1359959 |
0.0169995 |
49.16032 |
0 |
| Residuals |
900 |
0.3112172 |
0.0003458 |
NA |
NA |
Table showing the estimated marginal means for the linear model testing for significant difference in the the mean concordance of assigned haplogroups between genotyped and imputed data for different Reference Panel Number of included reference haplotypes (k_hap) filtering thresholds
| kHAP100 |
-0.0147225 |
0.0018503 |
900 |
-0.0183540 |
-0.0110911 |
| kHAP250 |
-0.0156329 |
0.0018503 |
900 |
-0.0192644 |
-0.0120015 |
| kHAP500 |
-0.0246978 |
0.0018503 |
900 |
-0.0283292 |
-0.0210663 |
| kHAP1000 |
-0.0346144 |
0.0018503 |
900 |
-0.0382459 |
-0.0309829 |
| kHAP2500 |
-0.0418155 |
0.0018503 |
900 |
-0.0454469 |
-0.0381840 |
| kHAP5000 |
-0.0451327 |
0.0018503 |
900 |
-0.0487642 |
-0.0415013 |
| kHAP10000 |
-0.0314612 |
0.0018503 |
900 |
-0.0350927 |
-0.0278298 |
| kHAP20000 |
-0.0234709 |
0.0018503 |
900 |
-0.0271024 |
-0.0198394 |
| kHAP30000 |
-0.0063727 |
0.0018503 |
900 |
-0.0100042 |
-0.0027413 |
Table showing the contrasts for the linear model testing for significant difference in the the mean concordance of assigned haplogroups between genotyped and imputed data for different Reference Panel Number of included reference haplotypes (k_hap) filtering thresholds
| kHAP100 - kHAP250 |
0.0009104 |
0.0026168 |
900 |
0.3479069 |
0.9999939 |
| kHAP100 - kHAP500 |
0.0099752 |
0.0026168 |
900 |
3.8120447 |
0.0046344 |
| kHAP100 - kHAP1000 |
0.0198918 |
0.0026168 |
900 |
7.6016920 |
0.0000000 |
| kHAP100 - kHAP2500 |
0.0270929 |
0.0026168 |
900 |
10.3535911 |
0.0000000 |
| kHAP100 - kHAP5000 |
0.0304102 |
0.0026168 |
900 |
11.6212863 |
0.0000000 |
| kHAP100 - kHAP10000 |
0.0167387 |
0.0026168 |
900 |
6.3967096 |
0.0000000 |
| kHAP100 - kHAP20000 |
0.0087483 |
0.0026168 |
900 |
3.3431916 |
0.0242232 |
| kHAP100 - kHAP30000 |
-0.0083498 |
0.0026168 |
900 |
-3.1908890 |
0.0391436 |
| kHAP250 - kHAP500 |
0.0090648 |
0.0026168 |
900 |
3.4641378 |
0.0162103 |
| kHAP250 - kHAP1000 |
0.0189815 |
0.0026168 |
900 |
7.2537851 |
0.0000000 |
| kHAP250 - kHAP2500 |
0.0261825 |
0.0026168 |
900 |
10.0056842 |
0.0000000 |
| kHAP250 - kHAP5000 |
0.0294998 |
0.0026168 |
900 |
11.2733794 |
0.0000000 |
| kHAP250 - kHAP10000 |
0.0158283 |
0.0026168 |
900 |
6.0488026 |
0.0000001 |
| kHAP250 - kHAP20000 |
0.0078380 |
0.0026168 |
900 |
2.9952846 |
0.0693853 |
| kHAP250 - kHAP30000 |
-0.0092602 |
0.0026168 |
900 |
-3.5387959 |
0.0125394 |
| kHAP500 - kHAP1000 |
0.0099166 |
0.0026168 |
900 |
3.7896473 |
0.0050445 |
| kHAP500 - kHAP2500 |
0.0171177 |
0.0026168 |
900 |
6.5415464 |
0.0000000 |
| kHAP500 - kHAP5000 |
0.0204350 |
0.0026168 |
900 |
7.8092416 |
0.0000000 |
| kHAP500 - kHAP10000 |
0.0067635 |
0.0026168 |
900 |
2.5846649 |
0.1944275 |
| kHAP500 - kHAP20000 |
-0.0012269 |
0.0026168 |
900 |
-0.4688531 |
0.9999391 |
| kHAP500 - kHAP30000 |
-0.0183250 |
0.0026168 |
900 |
-7.0029337 |
0.0000000 |
| kHAP1000 - kHAP2500 |
0.0072011 |
0.0026168 |
900 |
2.7518991 |
0.1315597 |
| kHAP1000 - kHAP5000 |
0.0105183 |
0.0026168 |
900 |
4.0195944 |
0.0020571 |
| kHAP1000 - kHAP10000 |
-0.0031532 |
0.0026168 |
900 |
-1.2049824 |
0.9554549 |
| kHAP1000 - kHAP20000 |
-0.0111435 |
0.0026168 |
900 |
-4.2585004 |
0.0007626 |
| kHAP1000 - kHAP30000 |
-0.0282417 |
0.0026168 |
900 |
-10.7925810 |
0.0000000 |
| kHAP2500 - kHAP5000 |
0.0033173 |
0.0026168 |
900 |
1.2676952 |
0.9403933 |
| kHAP2500 - kHAP10000 |
-0.0103542 |
0.0026168 |
900 |
-3.9568815 |
0.0026424 |
| kHAP2500 - kHAP20000 |
-0.0183446 |
0.0026168 |
900 |
-7.0103995 |
0.0000000 |
| kHAP2500 - kHAP30000 |
-0.0354427 |
0.0026168 |
900 |
-13.5444801 |
0.0000000 |
| kHAP5000 - kHAP10000 |
-0.0136715 |
0.0026168 |
900 |
-5.2245768 |
0.0000077 |
| kHAP5000 - kHAP20000 |
-0.0216618 |
0.0026168 |
900 |
-8.2780948 |
0.0000000 |
| kHAP5000 - kHAP30000 |
-0.0387600 |
0.0026168 |
900 |
-14.8121753 |
0.0000000 |
| kHAP10000 - kHAP20000 |
-0.0079903 |
0.0026168 |
900 |
-3.0535180 |
0.0588237 |
| kHAP10000 - kHAP30000 |
-0.0250885 |
0.0026168 |
900 |
-9.5875986 |
0.0000000 |
| kHAP20000 - kHAP30000 |
-0.0170982 |
0.0026168 |
900 |
-6.5340806 |
0.0000000 |
HaploGrep Macrohaplogrouping (with info ≥ 0.3 cutoff)
The trend of which can be further seen when only macro-haplogroups are considered:
Compare this result with the imputed data, which shows a higher haplogroup concordance:
If the improvement in accurate assignment of haplogroups wasn’t evident from the last two plots, displaying the mean difference should make this clear:
These can be statistically tested with linear models:
Table showing the residuals for the linear model testing for significant difference in the means of imputed macrohaplogroup concordance
| k_hap |
8 |
0.4348943 |
0.0543618 |
2.502299 |
0.0108598 |
| Residuals |
900 |
19.5522685 |
0.0217247 |
NA |
NA |
Table showing the estimated marginal means for the linear model testing for significant difference in the means of imputed macrohaplogroup concordance for different Reference Panel Number of included reference haplotypes (k_hap) filtering thresholds
| kHAP100 |
0.8956450 |
0.0146662 |
900 |
0.8668611 |
0.9244288 |
| kHAP250 |
0.8937851 |
0.0146662 |
900 |
0.8650012 |
0.9225690 |
| kHAP500 |
0.8899990 |
0.0146662 |
900 |
0.8612151 |
0.9187829 |
| kHAP1000 |
0.8851149 |
0.0146662 |
900 |
0.8563310 |
0.9138988 |
| kHAP2500 |
0.8756515 |
0.0146662 |
900 |
0.8468677 |
0.9044354 |
| kHAP5000 |
0.8520048 |
0.0146662 |
900 |
0.8232209 |
0.8807887 |
| kHAP10000 |
0.8438504 |
0.0146662 |
900 |
0.8150665 |
0.8726342 |
| kHAP20000 |
0.8324138 |
0.0146662 |
900 |
0.8036300 |
0.8611977 |
| kHAP30000 |
0.8675830 |
0.0146662 |
900 |
0.8387992 |
0.8963669 |
Table showing the contrasts for the linear model testing for significant difference in the means of imputed macrohaplogroup concordance for different Reference Panel Number of included reference haplotypes (k_hap) filtering thresholds
| kHAP100 - kHAP250 |
0.0018599 |
0.0207411 |
900 |
0.0896700 |
1.0000000 |
| kHAP100 - kHAP500 |
0.0056460 |
0.0207411 |
900 |
0.2722126 |
0.9999991 |
| kHAP100 - kHAP1000 |
0.0105301 |
0.0207411 |
900 |
0.5076906 |
0.9998883 |
| kHAP100 - kHAP2500 |
0.0199934 |
0.0207411 |
900 |
0.9639529 |
0.9889003 |
| kHAP100 - kHAP5000 |
0.0436402 |
0.0207411 |
900 |
2.1040433 |
0.4711207 |
| kHAP100 - kHAP10000 |
0.0517946 |
0.0207411 |
900 |
2.4971974 |
0.2345817 |
| kHAP100 - kHAP20000 |
0.0632311 |
0.0207411 |
900 |
3.0485928 |
0.0596615 |
| kHAP100 - kHAP30000 |
0.0280619 |
0.0207411 |
900 |
1.3529626 |
0.9147490 |
| kHAP250 - kHAP500 |
0.0037861 |
0.0207411 |
900 |
0.1825426 |
1.0000000 |
| kHAP250 - kHAP1000 |
0.0086702 |
0.0207411 |
900 |
0.4180206 |
0.9999748 |
| kHAP250 - kHAP2500 |
0.0181336 |
0.0207411 |
900 |
0.8742828 |
0.9942242 |
| kHAP250 - kHAP5000 |
0.0417803 |
0.0207411 |
900 |
2.0143733 |
0.5336276 |
| kHAP250 - kHAP10000 |
0.0499347 |
0.0207411 |
900 |
2.4075274 |
0.2809498 |
| kHAP250 - kHAP20000 |
0.0613713 |
0.0207411 |
900 |
2.9589228 |
0.0767433 |
| kHAP250 - kHAP30000 |
0.0262021 |
0.0207411 |
900 |
1.2632925 |
0.9415524 |
| kHAP500 - kHAP1000 |
0.0048841 |
0.0207411 |
900 |
0.2354780 |
0.9999997 |
| kHAP500 - kHAP2500 |
0.0143474 |
0.0207411 |
900 |
0.6917403 |
0.9988889 |
| kHAP500 - kHAP5000 |
0.0379942 |
0.0207411 |
900 |
1.8318307 |
0.6608708 |
| kHAP500 - kHAP10000 |
0.0461486 |
0.0207411 |
900 |
2.2249848 |
0.3901923 |
| kHAP500 - kHAP20000 |
0.0575852 |
0.0207411 |
900 |
2.7763802 |
0.1238253 |
| kHAP500 - kHAP30000 |
0.0224159 |
0.0207411 |
900 |
1.0807500 |
0.9769141 |
| kHAP1000 - kHAP2500 |
0.0094634 |
0.0207411 |
900 |
0.4562622 |
0.9999505 |
| kHAP1000 - kHAP5000 |
0.0331101 |
0.0207411 |
900 |
1.5963527 |
0.8069109 |
| kHAP1000 - kHAP10000 |
0.0412645 |
0.0207411 |
900 |
1.9895068 |
0.5511235 |
| kHAP1000 - kHAP20000 |
0.0527011 |
0.0207411 |
900 |
2.5409022 |
0.2138847 |
| kHAP1000 - kHAP30000 |
0.0175319 |
0.0207411 |
900 |
0.8452719 |
0.9954155 |
| kHAP2500 - kHAP5000 |
0.0236467 |
0.0207411 |
900 |
1.1400904 |
0.9679479 |
| kHAP2500 - kHAP10000 |
0.0318012 |
0.0207411 |
900 |
1.5332446 |
0.8397080 |
| kHAP2500 - kHAP20000 |
0.0432377 |
0.0207411 |
900 |
2.0846399 |
0.4845247 |
| kHAP2500 - kHAP30000 |
0.0080685 |
0.0207411 |
900 |
0.3890097 |
0.9999855 |
| kHAP5000 - kHAP10000 |
0.0081544 |
0.0207411 |
900 |
0.3931541 |
0.9999843 |
| kHAP5000 - kHAP20000 |
0.0195910 |
0.0207411 |
900 |
0.9445495 |
0.9902930 |
| kHAP5000 - kHAP30000 |
-0.0155782 |
0.0207411 |
900 |
-0.7510807 |
0.9979946 |
| kHAP10000 - kHAP20000 |
0.0114365 |
0.0207411 |
900 |
0.5513954 |
0.9997918 |
| kHAP10000 - kHAP30000 |
-0.0237327 |
0.0207411 |
900 |
-1.1442349 |
0.9672376 |
| kHAP20000 - kHAP30000 |
-0.0351692 |
0.0207411 |
900 |
-1.6956302 |
0.7493002 |
Table showing the residuals for the linear model testing for significant difference in the mean concordance of assigned macroaplogroups between genotyped and imputed data
| k_hap |
8 |
0.4348943 |
0.0543618 |
23.71659 |
0 |
| Residuals |
900 |
2.0629282 |
0.0022921 |
NA |
NA |
Table showing the estimated marginal means for the linear model testing for significant difference in the the mean concordance of assigned macrohaplogroups between genotyped and imputed data for different Reference Panel Number of included reference haplotypes (k_hap) filtering thresholds
| kHAP100 |
0.0135152 |
0.0047639 |
900 |
0.0041656 |
0.0228648 |
| kHAP250 |
0.0116553 |
0.0047639 |
900 |
0.0023058 |
0.0210049 |
| kHAP500 |
0.0078692 |
0.0047639 |
900 |
-0.0014804 |
0.0172188 |
| kHAP1000 |
0.0029851 |
0.0047639 |
900 |
-0.0063644 |
0.0123347 |
| kHAP2500 |
-0.0064782 |
0.0047639 |
900 |
-0.0158278 |
0.0028714 |
| kHAP5000 |
-0.0301250 |
0.0047639 |
900 |
-0.0394745 |
-0.0207754 |
| kHAP10000 |
-0.0382794 |
0.0047639 |
900 |
-0.0476290 |
-0.0289298 |
| kHAP20000 |
-0.0497159 |
0.0047639 |
900 |
-0.0590655 |
-0.0403664 |
| kHAP30000 |
-0.0145467 |
0.0047639 |
900 |
-0.0238963 |
-0.0051971 |
Table showing the contrasts for the linear model testing for significant difference in the the mean concordance of assigned macrohaplogroups between genotyped and imputed data for different Reference Panel Number of included reference haplotypes (k_hap) filtering thresholds
| kHAP100 - kHAP250 |
0.0018599 |
0.0067371 |
900 |
0.2760602 |
0.9999990 |
| kHAP100 - kHAP500 |
0.0056460 |
0.0067371 |
900 |
0.8380400 |
0.9956790 |
| kHAP100 - kHAP1000 |
0.0105301 |
0.0067371 |
900 |
1.5629881 |
0.8246482 |
| kHAP100 - kHAP2500 |
0.0199934 |
0.0067371 |
900 |
2.9676476 |
0.0749219 |
| kHAP100 - kHAP5000 |
0.0436402 |
0.0067371 |
900 |
6.4775563 |
0.0000000 |
| kHAP100 - kHAP10000 |
0.0517946 |
0.0067371 |
900 |
7.6879297 |
0.0000000 |
| kHAP100 - kHAP20000 |
0.0632311 |
0.0067371 |
900 |
9.3854682 |
0.0000000 |
| kHAP100 - kHAP30000 |
0.0280619 |
0.0067371 |
900 |
4.1652618 |
0.0011314 |
| kHAP250 - kHAP500 |
0.0037861 |
0.0067371 |
900 |
0.5619798 |
0.9997598 |
| kHAP250 - kHAP1000 |
0.0086702 |
0.0067371 |
900 |
1.2869279 |
0.9351435 |
| kHAP250 - kHAP2500 |
0.0181336 |
0.0067371 |
900 |
2.6915873 |
0.1521771 |
| kHAP250 - kHAP5000 |
0.0417803 |
0.0067371 |
900 |
6.2014961 |
0.0000000 |
| kHAP250 - kHAP10000 |
0.0499347 |
0.0067371 |
900 |
7.4118695 |
0.0000000 |
| kHAP250 - kHAP20000 |
0.0613713 |
0.0067371 |
900 |
9.1094079 |
0.0000000 |
| kHAP250 - kHAP30000 |
0.0262021 |
0.0067371 |
900 |
3.8892016 |
0.0034456 |
| kHAP500 - kHAP1000 |
0.0048841 |
0.0067371 |
900 |
0.7249481 |
0.9984425 |
| kHAP500 - kHAP2500 |
0.0143474 |
0.0067371 |
900 |
2.1296076 |
0.4536061 |
| kHAP500 - kHAP5000 |
0.0379942 |
0.0067371 |
900 |
5.6395163 |
0.0000008 |
| kHAP500 - kHAP10000 |
0.0461486 |
0.0067371 |
900 |
6.8498897 |
0.0000000 |
| kHAP500 - kHAP20000 |
0.0575852 |
0.0067371 |
900 |
8.5474282 |
0.0000000 |
| kHAP500 - kHAP30000 |
0.0224159 |
0.0067371 |
900 |
3.3272218 |
0.0255082 |
| kHAP1000 - kHAP2500 |
0.0094634 |
0.0067371 |
900 |
1.4046595 |
0.8961444 |
| kHAP1000 - kHAP5000 |
0.0331101 |
0.0067371 |
900 |
4.9145682 |
0.0000371 |
| kHAP1000 - kHAP10000 |
0.0412645 |
0.0067371 |
900 |
6.1249416 |
0.0000000 |
| kHAP1000 - kHAP20000 |
0.0527011 |
0.0067371 |
900 |
7.8224801 |
0.0000000 |
| kHAP1000 - kHAP30000 |
0.0175319 |
0.0067371 |
900 |
2.6022737 |
0.1869558 |
| kHAP2500 - kHAP5000 |
0.0236467 |
0.0067371 |
900 |
3.5099088 |
0.0138601 |
| kHAP2500 - kHAP10000 |
0.0318012 |
0.0067371 |
900 |
4.7202821 |
0.0000949 |
| kHAP2500 - kHAP20000 |
0.0432377 |
0.0067371 |
900 |
6.4178206 |
0.0000000 |
| kHAP2500 - kHAP30000 |
0.0080685 |
0.0067371 |
900 |
1.1976143 |
0.9570258 |
| kHAP5000 - kHAP10000 |
0.0081544 |
0.0067371 |
900 |
1.2103734 |
0.9542798 |
| kHAP5000 - kHAP20000 |
0.0195910 |
0.0067371 |
900 |
2.9079118 |
0.0881307 |
| kHAP5000 - kHAP30000 |
-0.0155782 |
0.0067371 |
900 |
-2.3122945 |
0.3356545 |
| kHAP10000 - kHAP20000 |
0.0114365 |
0.0067371 |
900 |
1.6975385 |
0.7481284 |
| kHAP10000 - kHAP30000 |
-0.0237327 |
0.0067371 |
900 |
-3.5226678 |
0.0132621 |
| kHAP20000 - kHAP30000 |
-0.0351692 |
0.0067371 |
900 |
-5.2202063 |
0.0000079 |
These results suggest that there is a statistically significant difference in accurate assignment of haplogroups between different Reference Panel Number of included reference haplotypes (k_hap) filtering thresholds. However, this improvement is tiny; therefore, the biological and practical significance of the improvement seems small.
These results suggest that there is no statistically significant difference in accurate assignment of macrohaplogroups between different Reference Panel Number of included reference haplotypes (k_hap) filtering thresholds. However, it should be noted that both the genotyped and imputed datasets allow HaploGrep to accurately call macrohaplogroups, with average accuracy in the high 80%s.
There is a slight increase in ability to accuracy call haplogroups when a filter of info > 0.3 is applied, but the biological and practical significance of the improvement again seems small.
HaploGrep haplogroup quality comparisons
We also examined the difference in HaploGrep’s quality score between the truthset, genotyped set, and imputed set.
Here I show the difference between the truth set and the genotyped set:
Here I show the difference between the truth set and the imputed set:
Here I show the difference between the truth set and the imputed set with the info score filter
info > 0.3:
Here it appears that relative to the truth set, the quality is still decreased.
However, I have also investigated the difference between the genotyped and imputed datasets to see if there is any improvement. I have only investigated the imputed dataset filtered with
info > 0.3.
On average, there is a decrease in HaploGrep quality score.
HaploGrep string distance (Damerau-Levenshtein)
We also examined the distance between the strings in assigned haplogroups, as measures of haplogroup concordance may be misleading if one sub-haplogroup isn’t correctly assigned. We used a few different measures, as different measures of distance will provide different results. All results are between the genotyped dataset and the imputed dataset with a info filter of info > 0.3
This result shows the Damerau-Levenshtein distance:
Table showing the residuals for the linear model testing for significant difference in the Damerau-Levenshtein string distance between assigned haplogroups
| k_hap |
8 |
7.701951 |
0.9627438 |
20.58027 |
0 |
| Residuals |
900 |
42.101947 |
0.0467799 |
NA |
NA |
Table showing the estimated marginal means for the linear model testing for significant difference in the means of imputed significant difference in the Damerau-Levenshtein string distance between assigned haplogroups for different Reference Panel Number of included reference haplotypes (k_hap) filtering thresholds
| kHAP100 |
0.2656427 |
0.0215213 |
900 |
0.2234049 |
0.3078805 |
| kHAP250 |
0.2860894 |
0.0215213 |
900 |
0.2438515 |
0.3283272 |
| kHAP500 |
0.3890456 |
0.0215213 |
900 |
0.3468078 |
0.4312835 |
| kHAP1000 |
0.4004665 |
0.0215213 |
900 |
0.3582287 |
0.4427044 |
| kHAP2500 |
0.4276142 |
0.0215213 |
900 |
0.3853763 |
0.4698520 |
| kHAP5000 |
0.3446826 |
0.0215213 |
900 |
0.3024448 |
0.3869205 |
| kHAP10000 |
0.2656466 |
0.0215213 |
900 |
0.2234088 |
0.3078844 |
| kHAP20000 |
0.2287035 |
0.0215213 |
900 |
0.1864657 |
0.2709413 |
| kHAP30000 |
0.1199958 |
0.0215213 |
900 |
0.0777579 |
0.1622336 |
Table showing the contrasts for the linear model testing for significant difference in the means of significant difference in the Damerau-Levenshtein string distance between assigned haplogroups for different Reference Panel Number of included reference haplotypes (k_hap) filtering thresholds
| kHAP100 - kHAP250 |
-0.0204467 |
0.0304358 |
900 |
-0.6717978 |
0.9991015 |
| kHAP100 - kHAP500 |
-0.1234029 |
0.0304358 |
900 |
-4.0545368 |
0.0017859 |
| kHAP100 - kHAP1000 |
-0.1348238 |
0.0304358 |
900 |
-4.4297834 |
0.0003610 |
| kHAP100 - kHAP2500 |
-0.1619714 |
0.0304358 |
900 |
-5.3217481 |
0.0000046 |
| kHAP100 - kHAP5000 |
-0.0790399 |
0.0304358 |
900 |
-2.5969422 |
0.1891965 |
| kHAP100 - kHAP10000 |
-0.0000039 |
0.0304358 |
900 |
-0.0001284 |
1.0000000 |
| kHAP100 - kHAP20000 |
0.0369392 |
0.0304358 |
900 |
1.2136780 |
0.9535486 |
| kHAP100 - kHAP30000 |
0.1456469 |
0.0304358 |
900 |
4.7853882 |
0.0000696 |
| kHAP250 - kHAP500 |
-0.1029562 |
0.0304358 |
900 |
-3.3827390 |
0.0212838 |
| kHAP250 - kHAP1000 |
-0.1143771 |
0.0304358 |
900 |
-3.7579855 |
0.0056815 |
| kHAP250 - kHAP2500 |
-0.1415248 |
0.0304358 |
900 |
-4.6499503 |
0.0001322 |
| kHAP250 - kHAP5000 |
-0.0585932 |
0.0304358 |
900 |
-1.9251444 |
0.5963696 |
| kHAP250 - kHAP10000 |
0.0204428 |
0.0304358 |
900 |
0.6716695 |
0.9991027 |
| kHAP250 - kHAP20000 |
0.0573859 |
0.0304358 |
900 |
1.8854758 |
0.6240314 |
| kHAP250 - kHAP30000 |
0.1660936 |
0.0304358 |
900 |
5.4571860 |
0.0000022 |
| kHAP500 - kHAP1000 |
-0.0114209 |
0.0304358 |
900 |
-0.3752465 |
0.9999891 |
| kHAP500 - kHAP2500 |
-0.0385685 |
0.0304358 |
900 |
-1.2672113 |
0.9405215 |
| kHAP500 - kHAP5000 |
0.0443630 |
0.0304358 |
900 |
1.4575946 |
0.8746543 |
| kHAP500 - kHAP10000 |
0.1233990 |
0.0304358 |
900 |
4.0544085 |
0.0017869 |
| kHAP500 - kHAP20000 |
0.1603421 |
0.0304358 |
900 |
5.2682148 |
0.0000061 |
| kHAP500 - kHAP30000 |
0.2690498 |
0.0304358 |
900 |
8.8399250 |
0.0000000 |
| kHAP1000 - kHAP2500 |
-0.0271476 |
0.0304358 |
900 |
-0.8919647 |
0.9933839 |
| kHAP1000 - kHAP5000 |
0.0557839 |
0.0304358 |
900 |
1.8328412 |
0.6601847 |
| kHAP1000 - kHAP10000 |
0.1348199 |
0.0304358 |
900 |
4.4296550 |
0.0003612 |
| kHAP1000 - kHAP20000 |
0.1717630 |
0.0304358 |
900 |
5.6434613 |
0.0000008 |
| kHAP1000 - kHAP30000 |
0.2804707 |
0.0304358 |
900 |
9.2151716 |
0.0000000 |
| kHAP2500 - kHAP5000 |
0.0829315 |
0.0304358 |
900 |
2.7248059 |
0.1405427 |
| kHAP2500 - kHAP10000 |
0.1619675 |
0.0304358 |
900 |
5.3216197 |
0.0000046 |
| kHAP2500 - kHAP20000 |
0.1989107 |
0.0304358 |
900 |
6.5354261 |
0.0000000 |
| kHAP2500 - kHAP30000 |
0.3076184 |
0.0304358 |
900 |
10.1071363 |
0.0000000 |
| kHAP5000 - kHAP10000 |
0.0790360 |
0.0304358 |
900 |
2.5968138 |
0.1892507 |
| kHAP5000 - kHAP20000 |
0.1159791 |
0.0304358 |
900 |
3.8106202 |
0.0046595 |
| kHAP5000 - kHAP30000 |
0.2246868 |
0.0304358 |
900 |
7.3823304 |
0.0000000 |
| kHAP10000 - kHAP20000 |
0.0369431 |
0.0304358 |
900 |
1.2138063 |
0.9535201 |
| kHAP10000 - kHAP30000 |
0.1456508 |
0.0304358 |
900 |
4.7855166 |
0.0000695 |
| kHAP20000 - kHAP30000 |
0.1087077 |
0.0304358 |
900 |
3.5717102 |
0.0111737 |
HaploGrep string distance (Levenshtein)
We also examined the distance between the strings in assigned haplogroups, as measures of haplogroup concordance may be misleading if one sub-haplogroup isn’t correctly assigned. We used a few different measures, as different measures of distance will provide different results. All results are between the genotyped dataset and the imputed dataset with a info filter of info > 0.3
This result shows the Levenshtein distance:
Table showing the residuals for the linear model testing for significant difference in the Levenshtein string distance between assigned haplogroups
| k_hap |
8 |
7.691524 |
0.9614406 |
20.52634 |
0 |
| Residuals |
900 |
42.155417 |
0.0468394 |
NA |
NA |
Table showing the estimated marginal means for the linear model testing for significant difference in the means of imputed significant difference in the Levenshtein string distance between assigned haplogroups for different Reference Panel Number of included reference haplotypes (k_hap) filtering thresholds
| kHAP100 |
0.2655997 |
0.021535 |
900 |
0.2233351 |
0.3078644 |
| kHAP250 |
0.2860308 |
0.021535 |
900 |
0.2437661 |
0.3282954 |
| kHAP500 |
0.3889792 |
0.021535 |
900 |
0.3467145 |
0.4312438 |
| kHAP1000 |
0.4004079 |
0.021535 |
900 |
0.3581433 |
0.4426726 |
| kHAP2500 |
0.4276259 |
0.021535 |
900 |
0.3853612 |
0.4698905 |
| kHAP5000 |
0.3446787 |
0.021535 |
900 |
0.3024141 |
0.3869434 |
| kHAP10000 |
0.2659045 |
0.021535 |
900 |
0.2236398 |
0.3081691 |
| kHAP20000 |
0.2288402 |
0.021535 |
900 |
0.1865756 |
0.2711049 |
| kHAP30000 |
0.1201286 |
0.021535 |
900 |
0.0778640 |
0.1623933 |
Table showing the contrasts for the linear model testing for significant difference in the means of significant difference in the Levenshtein string distance between assigned haplogroups for different Reference Panel Number of included reference haplotypes (k_hap) filtering thresholds
| kHAP100 - kHAP250 |
-0.0204310 |
0.0304551 |
900 |
-0.6708585 |
0.9991106 |
| kHAP100 - kHAP500 |
-0.1233795 |
0.0304551 |
900 |
-4.0511949 |
0.0018104 |
| kHAP100 - kHAP1000 |
-0.1348082 |
0.0304551 |
900 |
-4.4264599 |
0.0003664 |
| kHAP100 - kHAP2500 |
-0.1620261 |
0.0304551 |
900 |
-5.3201681 |
0.0000047 |
| kHAP100 - kHAP5000 |
-0.0790790 |
0.0304551 |
900 |
-2.5965777 |
0.1893504 |
| kHAP100 - kHAP10000 |
-0.0003048 |
0.0304551 |
900 |
-0.0100071 |
1.0000000 |
| kHAP100 - kHAP20000 |
0.0367595 |
0.0304551 |
900 |
1.2070064 |
0.9550163 |
| kHAP100 - kHAP30000 |
0.1454711 |
0.0304551 |
900 |
4.7765790 |
0.0000726 |
| kHAP250 - kHAP500 |
-0.1029484 |
0.0304551 |
900 |
-3.3803364 |
0.0214529 |
| kHAP250 - kHAP1000 |
-0.1143771 |
0.0304551 |
900 |
-3.7556015 |
0.0057323 |
| kHAP250 - kHAP2500 |
-0.1415951 |
0.0304551 |
900 |
-4.6493096 |
0.0001326 |
| kHAP250 - kHAP5000 |
-0.0586479 |
0.0304551 |
900 |
-1.9257192 |
0.5959670 |
| kHAP250 - kHAP10000 |
0.0201263 |
0.0304551 |
900 |
0.6608514 |
0.9992029 |
| kHAP250 - kHAP20000 |
0.0571905 |
0.0304551 |
900 |
1.8778649 |
0.6293040 |
| kHAP250 - kHAP30000 |
0.1659021 |
0.0304551 |
900 |
5.4474375 |
0.0000024 |
| kHAP500 - kHAP1000 |
-0.0114287 |
0.0304551 |
900 |
-0.3752651 |
0.9999891 |
| kHAP500 - kHAP2500 |
-0.0386467 |
0.0304551 |
900 |
-1.2689733 |
0.9400538 |
| kHAP500 - kHAP5000 |
0.0443005 |
0.0304551 |
900 |
1.4546172 |
0.8759285 |
| kHAP500 - kHAP10000 |
0.1230747 |
0.0304551 |
900 |
4.0411878 |
0.0018853 |
| kHAP500 - kHAP20000 |
0.1601389 |
0.0304551 |
900 |
5.2582013 |
0.0000065 |
| kHAP500 - kHAP30000 |
0.2688506 |
0.0304551 |
900 |
8.8277739 |
0.0000000 |
| kHAP1000 - kHAP2500 |
-0.0272180 |
0.0304551 |
900 |
-0.8937082 |
0.9932959 |
| kHAP1000 - kHAP5000 |
0.0557292 |
0.0304551 |
900 |
1.8298823 |
0.6621927 |
| kHAP1000 - kHAP10000 |
0.1345034 |
0.0304551 |
900 |
4.4164529 |
0.0003830 |
| kHAP1000 - kHAP20000 |
0.1715677 |
0.0304551 |
900 |
5.6334663 |
0.0000008 |
| kHAP1000 - kHAP30000 |
0.2802793 |
0.0304551 |
900 |
9.2030390 |
0.0000000 |
| kHAP2500 - kHAP5000 |
0.0829472 |
0.0304551 |
900 |
2.7235905 |
0.1409562 |
| kHAP2500 - kHAP10000 |
0.1617214 |
0.0304551 |
900 |
5.3101611 |
0.0000049 |
| kHAP2500 - kHAP20000 |
0.1987856 |
0.0304551 |
900 |
6.5271745 |
0.0000000 |
| kHAP2500 - kHAP30000 |
0.3074972 |
0.0304551 |
900 |
10.0967472 |
0.0000000 |
| kHAP5000 - kHAP10000 |
0.0787742 |
0.0304551 |
900 |
2.5865706 |
0.1936090 |
| kHAP5000 - kHAP20000 |
0.1158385 |
0.0304551 |
900 |
3.8035841 |
0.0047856 |
| kHAP5000 - kHAP30000 |
0.2245501 |
0.0304551 |
900 |
7.3731567 |
0.0000000 |
| kHAP10000 - kHAP20000 |
0.0370642 |
0.0304551 |
900 |
1.2170135 |
0.9528022 |
| kHAP10000 - kHAP30000 |
0.1457759 |
0.0304551 |
900 |
4.7865861 |
0.0000692 |
| kHAP20000 - kHAP30000 |
0.1087116 |
0.0304551 |
900 |
3.5695726 |
0.0112581 |
HaploGrep string distance (Jaccard)
We also examined the distance between the strings in assigned haplogroups, as measures of haplogroup concordance may be misleading if one sub-haplogroup isn’t correctly assigned. We used a few different measures, as different measures of distance will provide different results. All results are between the genotyped dataset and the imputed dataset with a info filter of info > 0.3
This result shows the Levenshtein distance:
Table showing the residuals for the linear model testing for significant difference in the Jaccard string distance between assigned haplogroups
| k_hap |
8 |
0.2274493 |
0.0284312 |
46.83543 |
0 |
| Residuals |
900 |
0.5463395 |
0.0006070 |
NA |
NA |
Table showing the estimated marginal means for the linear model testing for significant difference in the means of imputed significant difference in the Jaccard string distance between assigned haplogroups for different Reference Panel Number of included reference haplotypes (k_hap) filtering thresholds
| kHAP100 |
0.0141382 |
0.0024516 |
900 |
0.0093267 |
0.0189497 |
| kHAP250 |
0.0167605 |
0.0024516 |
900 |
0.0119490 |
0.0215720 |
| kHAP500 |
0.0259972 |
0.0024516 |
900 |
0.0211857 |
0.0308087 |
| kHAP1000 |
0.0369160 |
0.0024516 |
900 |
0.0321044 |
0.0417275 |
| kHAP2500 |
0.0447019 |
0.0024516 |
900 |
0.0398904 |
0.0495134 |
| kHAP5000 |
0.0586427 |
0.0024516 |
900 |
0.0538312 |
0.0634542 |
| kHAP10000 |
0.0494711 |
0.0024516 |
900 |
0.0446596 |
0.0542826 |
| kHAP20000 |
0.0430226 |
0.0024516 |
900 |
0.0382111 |
0.0478341 |
| kHAP30000 |
0.0128689 |
0.0024516 |
900 |
0.0080574 |
0.0176804 |
Table showing the contrasts for the linear model testing for significant difference in the means of significant difference in the Jaccard string distance between assigned haplogroups for different Reference Panel Number of included reference haplotypes (k_hap) filtering thresholds
| kHAP100 - kHAP250 |
-0.0026223 |
0.0034671 |
900 |
-0.7563317 |
0.9978929 |
| kHAP100 - kHAP500 |
-0.0118590 |
0.0034671 |
900 |
-3.4204517 |
0.0187800 |
| kHAP100 - kHAP1000 |
-0.0227777 |
0.0034671 |
900 |
-6.5697133 |
0.0000000 |
| kHAP100 - kHAP2500 |
-0.0305637 |
0.0034671 |
900 |
-8.8153859 |
0.0000000 |
| kHAP100 - kHAP5000 |
-0.0445045 |
0.0034671 |
900 |
-12.8362952 |
0.0000000 |
| kHAP100 - kHAP10000 |
-0.0353329 |
0.0034671 |
900 |
-10.1909453 |
0.0000000 |
| kHAP100 - kHAP20000 |
-0.0288844 |
0.0034671 |
900 |
-8.3310394 |
0.0000000 |
| kHAP100 - kHAP30000 |
0.0012693 |
0.0034671 |
900 |
0.3661078 |
0.9999910 |
| kHAP250 - kHAP500 |
-0.0092367 |
0.0034671 |
900 |
-2.6641200 |
0.1623240 |
| kHAP250 - kHAP1000 |
-0.0201555 |
0.0034671 |
900 |
-5.8133816 |
0.0000003 |
| kHAP250 - kHAP2500 |
-0.0279414 |
0.0034671 |
900 |
-8.0590542 |
0.0000000 |
| kHAP250 - kHAP5000 |
-0.0418822 |
0.0034671 |
900 |
-12.0799635 |
0.0000000 |
| kHAP250 - kHAP10000 |
-0.0327106 |
0.0034671 |
900 |
-9.4346136 |
0.0000000 |
| kHAP250 - kHAP20000 |
-0.0262621 |
0.0034671 |
900 |
-7.5747077 |
0.0000000 |
| kHAP250 - kHAP30000 |
0.0038916 |
0.0034671 |
900 |
1.1224395 |
0.9708462 |
| kHAP500 - kHAP1000 |
-0.0109188 |
0.0034671 |
900 |
-3.1492616 |
0.0444011 |
| kHAP500 - kHAP2500 |
-0.0187047 |
0.0034671 |
900 |
-5.3949342 |
0.0000031 |
| kHAP500 - kHAP5000 |
-0.0326455 |
0.0034671 |
900 |
-9.4158435 |
0.0000000 |
| kHAP500 - kHAP10000 |
-0.0234739 |
0.0034671 |
900 |
-6.7704937 |
0.0000000 |
| kHAP500 - kHAP20000 |
-0.0170254 |
0.0034671 |
900 |
-4.9105877 |
0.0000378 |
| kHAP500 - kHAP30000 |
0.0131283 |
0.0034671 |
900 |
3.7865595 |
0.0051036 |
| kHAP1000 - kHAP2500 |
-0.0077859 |
0.0034671 |
900 |
-2.2456726 |
0.3769298 |
| kHAP1000 - kHAP5000 |
-0.0217268 |
0.0034671 |
900 |
-6.2665819 |
0.0000000 |
| kHAP1000 - kHAP10000 |
-0.0125551 |
0.0034671 |
900 |
-3.6212320 |
0.0093715 |
| kHAP1000 - kHAP20000 |
-0.0061067 |
0.0034671 |
900 |
-1.7613261 |
0.7077634 |
| kHAP1000 - kHAP30000 |
0.0240471 |
0.0034671 |
900 |
6.9358211 |
0.0000000 |
| kHAP2500 - kHAP5000 |
-0.0139408 |
0.0034671 |
900 |
-4.0209093 |
0.0020462 |
| kHAP2500 - kHAP10000 |
-0.0047692 |
0.0034671 |
900 |
-1.3755595 |
0.9069056 |
| kHAP2500 - kHAP20000 |
0.0016793 |
0.0034671 |
900 |
0.4843465 |
0.9999219 |
| kHAP2500 - kHAP30000 |
0.0318330 |
0.0034671 |
900 |
9.1814937 |
0.0000000 |
| kHAP5000 - kHAP10000 |
0.0091716 |
0.0034671 |
900 |
2.6453498 |
0.1695366 |
| kHAP5000 - kHAP20000 |
0.0156201 |
0.0034671 |
900 |
4.5052558 |
0.0002572 |
| kHAP5000 - kHAP30000 |
0.0457738 |
0.0034671 |
900 |
13.2024030 |
0.0000000 |
| kHAP10000 - kHAP20000 |
0.0064485 |
0.0034671 |
900 |
1.8599059 |
0.6416899 |
| kHAP10000 - kHAP30000 |
0.0366022 |
0.0034671 |
900 |
10.5570532 |
0.0000000 |
| kHAP20000 - kHAP30000 |
0.0301537 |
0.0034671 |
900 |
8.6971472 |
0.0000000 |
Matthew’s Correlation Coefficient (MCC)
We also determined imputation accuracy using the Matthew’s correlation coefficient (MCC). The MCC is a more direct method of measuring the imputation accuracy of genotypes (as opposed to haplotypes).
Table showing the residuals for the linear model testing for significant difference in the Matthew’s correlation coefficient between assigned haplogroups
| k_hap |
8 |
2.017058 |
0.2521323 |
24.0245 |
0 |
| Residuals |
900 |
9.445317 |
0.0104948 |
NA |
NA |
Table showing the estimated marginal means for the linear model testing for significant difference in the means of Matthew’s correlation coefficient for different Reference Panel Number of included reference haplotypes (k_hap) filtering thresholds
| kHAP100 |
0.8381114 |
0.0101936 |
900 |
0.8181054 |
0.8581173 |
| kHAP250 |
0.8491270 |
0.0101936 |
900 |
0.8291211 |
0.8691330 |
| kHAP500 |
0.8629537 |
0.0101936 |
900 |
0.8429477 |
0.8829596 |
| kHAP1000 |
0.8793764 |
0.0101936 |
900 |
0.8593705 |
0.8993824 |
| kHAP2500 |
0.8585136 |
0.0101936 |
900 |
0.8385077 |
0.8785196 |
| kHAP5000 |
0.8618736 |
0.0101936 |
900 |
0.8418676 |
0.8818795 |
| kHAP10000 |
0.9071607 |
0.0101936 |
900 |
0.8871547 |
0.9271666 |
| kHAP20000 |
0.9660753 |
0.0101936 |
900 |
0.9460694 |
0.9860813 |
| kHAP30000 |
0.9731845 |
0.0101936 |
900 |
0.9531786 |
0.9931905 |
Table showing the contrasts for the linear model testing for significant difference in the means of Matthew’s correlation coefficient for different Reference Panel Number of included reference haplotypes (k_hap) filtering thresholds
| kHAP100 - kHAP250 |
-0.0110157 |
0.0144159 |
900 |
-0.7641330 |
0.9977339 |
| kHAP100 - kHAP500 |
-0.0248423 |
0.0144159 |
900 |
-1.7232577 |
0.7321242 |
| kHAP100 - kHAP1000 |
-0.0412650 |
0.0144159 |
900 |
-2.8624702 |
0.0993893 |
| kHAP100 - kHAP2500 |
-0.0204022 |
0.0144159 |
900 |
-1.4152609 |
0.8920384 |
| kHAP100 - kHAP5000 |
-0.0237622 |
0.0144159 |
900 |
-1.6483338 |
0.7775986 |
| kHAP100 - kHAP10000 |
-0.0690493 |
0.0144159 |
900 |
-4.7898051 |
0.0000681 |
| kHAP100 - kHAP20000 |
-0.1279640 |
0.0144159 |
900 |
-8.8765937 |
0.0000000 |
| kHAP100 - kHAP30000 |
-0.1350732 |
0.0144159 |
900 |
-9.3697441 |
0.0000000 |
| kHAP250 - kHAP500 |
-0.0138266 |
0.0144159 |
900 |
-0.9591247 |
0.9892607 |
| kHAP250 - kHAP1000 |
-0.0302494 |
0.0144159 |
900 |
-2.0983372 |
0.4750534 |
| kHAP250 - kHAP2500 |
-0.0093866 |
0.0144159 |
900 |
-0.6511279 |
0.9992848 |
| kHAP250 - kHAP5000 |
-0.0127465 |
0.0144159 |
900 |
-0.8842008 |
0.9937642 |
| kHAP250 - kHAP10000 |
-0.0580336 |
0.0144159 |
900 |
-4.0256720 |
0.0020073 |
| kHAP250 - kHAP20000 |
-0.1169483 |
0.0144159 |
900 |
-8.1124606 |
0.0000000 |
| kHAP250 - kHAP30000 |
-0.1240575 |
0.0144159 |
900 |
-8.6056111 |
0.0000000 |
| kHAP500 - kHAP1000 |
-0.0164228 |
0.0144159 |
900 |
-1.1392125 |
0.9680969 |
| kHAP500 - kHAP2500 |
0.0044400 |
0.0144159 |
900 |
0.3079968 |
0.9999977 |
| kHAP500 - kHAP5000 |
0.0010801 |
0.0144159 |
900 |
0.0749239 |
1.0000000 |
| kHAP500 - kHAP10000 |
-0.0442070 |
0.0144159 |
900 |
-3.0665473 |
0.0566548 |
| kHAP500 - kHAP20000 |
-0.1031217 |
0.0144159 |
900 |
-7.1533359 |
0.0000000 |
| kHAP500 - kHAP30000 |
-0.1102309 |
0.0144159 |
900 |
-7.6464864 |
0.0000000 |
| kHAP1000 - kHAP2500 |
0.0208628 |
0.0144159 |
900 |
1.4472093 |
0.8790649 |
| kHAP1000 - kHAP5000 |
0.0175029 |
0.0144159 |
900 |
1.2141364 |
0.9534465 |
| kHAP1000 - kHAP10000 |
-0.0277842 |
0.0144159 |
900 |
-1.9273348 |
0.5948350 |
| kHAP1000 - kHAP20000 |
-0.0866989 |
0.0144159 |
900 |
-6.0141234 |
0.0000001 |
| kHAP1000 - kHAP30000 |
-0.0938081 |
0.0144159 |
900 |
-6.5072739 |
0.0000000 |
| kHAP2500 - kHAP5000 |
-0.0033600 |
0.0144159 |
900 |
-0.2330729 |
0.9999997 |
| kHAP2500 - kHAP10000 |
-0.0486470 |
0.0144159 |
900 |
-3.3745441 |
0.0218655 |
| kHAP2500 - kHAP20000 |
-0.1075617 |
0.0144159 |
900 |
-7.4613327 |
0.0000000 |
| kHAP2500 - kHAP30000 |
-0.1146709 |
0.0144159 |
900 |
-7.9544832 |
0.0000000 |
| kHAP5000 - kHAP10000 |
-0.0452871 |
0.0144159 |
900 |
-3.1414712 |
0.0454495 |
| kHAP5000 - kHAP20000 |
-0.1042018 |
0.0144159 |
900 |
-7.2282598 |
0.0000000 |
| kHAP5000 - kHAP30000 |
-0.1113110 |
0.0144159 |
900 |
-7.7214103 |
0.0000000 |
| kHAP10000 - kHAP20000 |
-0.0589147 |
0.0144159 |
900 |
-4.0867886 |
0.0015657 |
| kHAP10000 - kHAP30000 |
-0.0660239 |
0.0144159 |
900 |
-4.5799391 |
0.0001829 |
| kHAP20000 - kHAP30000 |
-0.0071092 |
0.0144159 |
900 |
-0.4931505 |
0.9999105 |
IMPUTE2 INFO Score
We are also reporting IMPUTE2’s INFO score. Here I will plot INFO scores for both the raw imputed data, and the imputed data after info score filtering
Table showing the residuals for the linear model testing for significant difference in the IMPUTE2 INFO Score between assigned haplogroups
| k_hap |
8 |
15.01398 |
1.8767479 |
90.61791 |
0 |
| Residuals |
900 |
18.63951 |
0.0207106 |
NA |
NA |
Table showing the estimated marginal means for the linear model testing for significant difference in the means ofIMPUTE2 INFO Score for different Reference Panel Number of included reference haplotypes (k_hap) filtering thresholds
| kHAP100 |
0.7569623 |
0.0143197 |
900 |
0.7288583 |
0.7850662 |
| kHAP250 |
0.7423139 |
0.0143197 |
900 |
0.7142099 |
0.7704178 |
| kHAP500 |
0.7253970 |
0.0143197 |
900 |
0.6972930 |
0.7535009 |
| kHAP1000 |
0.6927940 |
0.0143197 |
900 |
0.6646900 |
0.7208980 |
| kHAP2500 |
0.6200950 |
0.0143197 |
900 |
0.5919910 |
0.6481990 |
| kHAP5000 |
0.5426906 |
0.0143197 |
900 |
0.5145866 |
0.5707946 |
| kHAP10000 |
0.4798140 |
0.0143197 |
900 |
0.4517100 |
0.5079180 |
| kHAP20000 |
0.4357204 |
0.0143197 |
900 |
0.4076164 |
0.4638244 |
| kHAP30000 |
0.4147448 |
0.0143197 |
900 |
0.3866408 |
0.4428488 |
Table showing the contrasts for the linear model testing for significant difference in the means of IMPUTE2 INFO Score for different Reference Panel Number of included reference haplotypes (k_hap) filtering thresholds
| kHAP100 - kHAP250 |
0.0146484 |
0.0202512 |
900 |
0.7233364 |
0.9984671 |
| kHAP100 - kHAP500 |
0.0315653 |
0.0202512 |
900 |
1.5586901 |
0.8268690 |
| kHAP100 - kHAP1000 |
0.0641683 |
0.0202512 |
900 |
3.1686202 |
0.0418853 |
| kHAP100 - kHAP2500 |
0.1368673 |
0.0202512 |
900 |
6.7584842 |
0.0000000 |
| kHAP100 - kHAP5000 |
0.2142717 |
0.0202512 |
900 |
10.5807031 |
0.0000000 |
| kHAP100 - kHAP10000 |
0.2771483 |
0.0202512 |
900 |
13.6855383 |
0.0000000 |
| kHAP100 - kHAP20000 |
0.3212419 |
0.0202512 |
900 |
15.8628736 |
0.0000000 |
| kHAP100 - kHAP30000 |
0.3422175 |
0.0202512 |
900 |
16.8986456 |
0.0000000 |
| kHAP250 - kHAP500 |
0.0169169 |
0.0202512 |
900 |
0.8353537 |
0.9957737 |
| kHAP250 - kHAP1000 |
0.0495199 |
0.0202512 |
900 |
2.4452838 |
0.2607954 |
| kHAP250 - kHAP2500 |
0.1222188 |
0.0202512 |
900 |
6.0351478 |
0.0000001 |
| kHAP250 - kHAP5000 |
0.1996233 |
0.0202512 |
900 |
9.8573667 |
0.0000000 |
| kHAP250 - kHAP10000 |
0.2624998 |
0.0202512 |
900 |
12.9622019 |
0.0000000 |
| kHAP250 - kHAP20000 |
0.3065934 |
0.0202512 |
900 |
15.1395372 |
0.0000000 |
| kHAP250 - kHAP30000 |
0.3275690 |
0.0202512 |
900 |
16.1753092 |
0.0000000 |
| kHAP500 - kHAP1000 |
0.0326030 |
0.0202512 |
900 |
1.6099301 |
0.7994464 |
| kHAP500 - kHAP2500 |
0.1053019 |
0.0202512 |
900 |
5.1997941 |
0.0000088 |
| kHAP500 - kHAP5000 |
0.1827064 |
0.0202512 |
900 |
9.0220130 |
0.0000000 |
| kHAP500 - kHAP10000 |
0.2455829 |
0.0202512 |
900 |
12.1268482 |
0.0000000 |
| kHAP500 - kHAP20000 |
0.2896765 |
0.0202512 |
900 |
14.3041835 |
0.0000000 |
| kHAP500 - kHAP30000 |
0.3106521 |
0.0202512 |
900 |
15.3399556 |
0.0000000 |
| kHAP1000 - kHAP2500 |
0.0726990 |
0.0202512 |
900 |
3.5898640 |
0.0104795 |
| kHAP1000 - kHAP5000 |
0.1501034 |
0.0202512 |
900 |
7.4120829 |
0.0000000 |
| kHAP1000 - kHAP10000 |
0.2129800 |
0.0202512 |
900 |
10.5169181 |
0.0000000 |
| kHAP1000 - kHAP20000 |
0.2570736 |
0.0202512 |
900 |
12.6942534 |
0.0000000 |
| kHAP1000 - kHAP30000 |
0.2780492 |
0.0202512 |
900 |
13.7300255 |
0.0000000 |
| kHAP2500 - kHAP5000 |
0.0774044 |
0.0202512 |
900 |
3.8222189 |
0.0044584 |
| kHAP2500 - kHAP10000 |
0.1402810 |
0.0202512 |
900 |
6.9270541 |
0.0000000 |
| kHAP2500 - kHAP20000 |
0.1843746 |
0.0202512 |
900 |
9.1043894 |
0.0000000 |
| kHAP2500 - kHAP30000 |
0.2053502 |
0.0202512 |
900 |
10.1401615 |
0.0000000 |
| kHAP5000 - kHAP10000 |
0.0628766 |
0.0202512 |
900 |
3.1048352 |
0.0506674 |
| kHAP5000 - kHAP20000 |
0.1069702 |
0.0202512 |
900 |
5.2821705 |
0.0000057 |
| kHAP5000 - kHAP30000 |
0.1279458 |
0.0202512 |
900 |
6.3179425 |
0.0000000 |
| kHAP10000 - kHAP20000 |
0.0440936 |
0.0202512 |
900 |
2.1773353 |
0.4214463 |
| kHAP10000 - kHAP30000 |
0.0650692 |
0.0202512 |
900 |
3.2131074 |
0.0365637 |
| kHAP20000 - kHAP30000 |
0.0209756 |
0.0202512 |
900 |
1.0357721 |
0.9823348 |
Table showing the residuals for the linear model testing for significant difference in the IMPUTE2 INFO Score (following filtering to info > 0.3) between assigned haplogroups
| k_hap |
8 |
1.687115 |
0.2108894 |
73.78828 |
0 |
| Residuals |
900 |
2.572230 |
0.0028580 |
NA |
NA |
Table showing the estimated marginal means for the linear model testing for significant difference in the means ofIMPUTE2 INFO Score (following filtering to info > 0.3) for different Reference Panel Number of included reference haplotypes (k_hap) filtering thresholds
| kHAP100 |
0.8443088 |
0.0053195 |
900 |
0.8338687 |
0.8547489 |
| kHAP250 |
0.8430381 |
0.0053195 |
900 |
0.8325980 |
0.8534782 |
| kHAP500 |
0.8422536 |
0.0053195 |
900 |
0.8318135 |
0.8526937 |
| kHAP1000 |
0.8340947 |
0.0053195 |
900 |
0.8236545 |
0.8445348 |
| kHAP2500 |
0.8172612 |
0.0053195 |
900 |
0.8068210 |
0.8277013 |
| kHAP5000 |
0.8196562 |
0.0053195 |
900 |
0.8092161 |
0.8300963 |
| kHAP10000 |
0.8720843 |
0.0053195 |
900 |
0.8616441 |
0.8825244 |
| kHAP20000 |
0.9221291 |
0.0053195 |
900 |
0.9116890 |
0.9325692 |
| kHAP30000 |
0.9478903 |
0.0053195 |
900 |
0.9374502 |
0.9583304 |
Table showing the contrasts for the linear model testing for significant difference in the means of IMPUTE2 INFO Score (following filtering to info > 0.3) for different Reference Panel Number of included reference haplotypes (k_hap) filtering thresholds
| kHAP100 - kHAP250 |
0.0012707 |
0.0075229 |
900 |
0.1689062 |
1.0000000 |
| kHAP100 - kHAP500 |
0.0020552 |
0.0075229 |
900 |
0.2731930 |
0.9999991 |
| kHAP100 - kHAP1000 |
0.0102141 |
0.0075229 |
900 |
1.3577307 |
0.9131311 |
| kHAP100 - kHAP2500 |
0.0270476 |
0.0075229 |
900 |
3.5953510 |
0.0102775 |
| kHAP100 - kHAP5000 |
0.0246526 |
0.0075229 |
900 |
3.2769847 |
0.0299501 |
| kHAP100 - kHAP10000 |
-0.0277755 |
0.0075229 |
900 |
-3.6921014 |
0.0072500 |
| kHAP100 - kHAP20000 |
-0.0778203 |
0.0075229 |
900 |
-10.3443921 |
0.0000000 |
| kHAP100 - kHAP30000 |
-0.1035815 |
0.0075229 |
900 |
-13.7687410 |
0.0000000 |
| kHAP250 - kHAP500 |
0.0007845 |
0.0075229 |
900 |
0.1042868 |
1.0000000 |
| kHAP250 - kHAP1000 |
0.0089435 |
0.0075229 |
900 |
1.1888245 |
0.9588473 |
| kHAP250 - kHAP2500 |
0.0257770 |
0.0075229 |
900 |
3.4264448 |
0.0184072 |
| kHAP250 - kHAP5000 |
0.0233819 |
0.0075229 |
900 |
3.1080785 |
0.0501858 |
| kHAP250 - kHAP10000 |
-0.0290462 |
0.0075229 |
900 |
-3.8610076 |
0.0038427 |
| kHAP250 - kHAP20000 |
-0.0790910 |
0.0075229 |
900 |
-10.5132982 |
0.0000000 |
| kHAP250 - kHAP30000 |
-0.1048522 |
0.0075229 |
900 |
-13.9376472 |
0.0000000 |
| kHAP500 - kHAP1000 |
0.0081589 |
0.0075229 |
900 |
1.0845377 |
0.9764060 |
| kHAP500 - kHAP2500 |
0.0249924 |
0.0075229 |
900 |
3.3221580 |
0.0259281 |
| kHAP500 - kHAP5000 |
0.0225974 |
0.0075229 |
900 |
3.0037917 |
0.0677508 |
| kHAP500 - kHAP10000 |
-0.0298307 |
0.0075229 |
900 |
-3.9652944 |
0.0025558 |
| kHAP500 - kHAP20000 |
-0.0798755 |
0.0075229 |
900 |
-10.6175851 |
0.0000000 |
| kHAP500 - kHAP30000 |
-0.1056367 |
0.0075229 |
900 |
-14.0419340 |
0.0000000 |
| kHAP1000 - kHAP2500 |
0.0168335 |
0.0075229 |
900 |
2.2376203 |
0.3820684 |
| kHAP1000 - kHAP5000 |
0.0144384 |
0.0075229 |
900 |
1.9192540 |
0.6004931 |
| kHAP1000 - kHAP10000 |
-0.0379896 |
0.0075229 |
900 |
-5.0498321 |
0.0000189 |
| kHAP1000 - kHAP20000 |
-0.0880344 |
0.0075229 |
900 |
-11.7021228 |
0.0000000 |
| kHAP1000 - kHAP30000 |
-0.1137956 |
0.0075229 |
900 |
-15.1264717 |
0.0000000 |
| kHAP2500 - kHAP5000 |
-0.0023951 |
0.0075229 |
900 |
-0.3183663 |
0.9999970 |
| kHAP2500 - kHAP10000 |
-0.0548231 |
0.0075229 |
900 |
-7.2874524 |
0.0000000 |
| kHAP2500 - kHAP20000 |
-0.1048679 |
0.0075229 |
900 |
-13.9397431 |
0.0000000 |
| kHAP2500 - kHAP30000 |
-0.1306291 |
0.0075229 |
900 |
-17.3640920 |
0.0000000 |
| kHAP5000 - kHAP10000 |
-0.0524281 |
0.0075229 |
900 |
-6.9690861 |
0.0000000 |
| kHAP5000 - kHAP20000 |
-0.1024729 |
0.0075229 |
900 |
-13.6213768 |
0.0000000 |
| kHAP5000 - kHAP30000 |
-0.1282341 |
0.0075229 |
900 |
-17.0457257 |
0.0000000 |
| kHAP10000 - kHAP20000 |
-0.0500448 |
0.0075229 |
900 |
-6.6522906 |
0.0000000 |
| kHAP10000 - kHAP30000 |
-0.0758060 |
0.0075229 |
900 |
-10.0766396 |
0.0000000 |
| kHAP20000 - kHAP30000 |
-0.0257612 |
0.0075229 |
900 |
-3.4243490 |
0.0185368 |